Kernel: Poll and Select

From TechPubs Wiki

This section describes the behavior of IRIX’s 6.5 Kernel poll(2) and select(2) system calls as observed from kernel behavior and historical design goals.

It intentionally avoids implementation details and code structure, focusing instead on externally visible semantics, internal contracts, and compatibility considerations.

1. Design philosophy

IRIX treats poll(2) as the fundamental readiness-notification mechanism.

select(2) is a compatibility interface layered atop poll.

The IRIX design emphasizes:

  • Precise wakeups over broadcast signaling
  • Predictable latency under high fd counts
  • Scalability on large SMP systems
  • Tolerance of imperfect or legacy drivers
  • Compatibility with historical SVR4 semantics

This leads to observable differences from Solaris and BSD-derived systems.

2. Core readiness model

2.1 Explicit interest registration

When a thread calls poll(2) or select(2), IRIX:

  • Actively registers the thread’s interest in each polled object
  • Associates that interest with the object itself
  • Records the specific events the thread is waiting for

This differs from systems that merely re-scan descriptors after wakeup.

Consequence:

Only threads that expressed interest in a specific event are eligible to be woken.

2.2 Object-centric wakeups

Pollable kernel objects (files, sockets, devices) are responsible for:

  • Advertising readiness
  • Waking interested threads when state changes

Wakeups originate from the object, not from a global selector.

Consequence:

Wakeups are targeted and efficient, especially when many threads are polling unrelated objects.

3. Wakeup precision and filtering

IRIX wakeups are event-filtered:

  • A thread is woken only if:
    • The event matches its registered interest, or
    • The event represents a terminal condition (e.g., hangup or error)
  • Objects may request:
    • Waking all interested threads
    • Waking only a single waiting thread

This behavior reduces unnecessary context switches.

Comparison to Solaris

Solaris follows a similar object-centric model, but tends to wake broader sets of threads and rely more on post-wakeup filtering.

4. Poll ordering and locality (“rotor behavior”)

IRIX introduces a locality optimization that influences how readiness scans resume after sleep:

  • When a thread wakes, IRIX records which descriptor triggered the wakeup
  • Subsequent readiness scans preferentially resume near that descriptor
  • If ambiguity exists (e.g., multiple wakeups or mixed object capabilities), the optimization is disabled

This improves performance for workloads that repeatedly poll the same small subset of descriptors.

Observable effects

  • Reduced latency for hot descriptors
  • Slightly different readiness ordering than Solaris in some cases
  • No effect on correctness or POSIX compliance

5. Handling of partially pollable objects

IRIX supports environments where:

  • Some objects fully support readiness notification
  • Others cannot actively signal readiness

In such cases:

  • The thread may still block
  • Optimizations that rely on precise wakeup hints are conservatively disabled
  • Correctness is preserved at the cost of additional scanning

Comparison to Solaris

Solaris generally expects pollable objects to fully participate in readiness signaling and is less tolerant of partial support.

6. Race avoidance and readiness consistency

IRIX takes explicit measures to avoid lost wakeups:

  • Readiness is checked before sleeping
  • Registration and wakeup are ordered to detect concurrent state changes
  • If readiness changes during registration, the scan is immediately retried

Consequence:

A readiness event that occurs during a poll(2) call is not silently lost.

Solaris employs similar concepts, but IRIX favors immediate in-kernel recovery rather than deferring detection to later scans.

7. Timeout behavior

IRIX timeouts:

  • Are internally rounded to system clock resolution
  • Are conservatively extended to avoid early expiration
  • May therefore exceed the requested timeout by a small margin

This behavior prioritizes correctness over precision.

Solaris typically offers finer-grained timeout resolution.

8. Interaction with signals

If a polling thread receives a signal:

  • The poll is interrupted
  • All readiness registrations made during the call are cleaned up
  • The system call returns with EINTR

IRIX guarantees that interrupted polls do not leave stale readiness state behind.

9. select(2) compatibility semantics

Although implemented atop poll(2), IRIX preserves traditional select(2) behavior:

  • Invalid descriptors cause select(2) to fail with EBADF
  • The same condition in poll(2) results in per-descriptor error reporting
  • Exceptional conditions are reported using historical bitmask semantics

This ensures compatibility with legacy applications.

10. Driver expectations

Kernel drivers participating in readiness notification are expected to:

  • Advertise readiness consistently
  • Signal state changes when readiness transitions occur
  • Maintain internal state sufficient to avoid missed notifications

IRIX is tolerant of drivers that partially implement these expectations, though reduced performance or warnings may result.

Solaris assumes stricter adherence to the readiness framework.

11. Summary of key differences from Solaris

Area IRIX Solaris
Readiness model Object-centric, targeted Object-centric, broader
Wakeup precision Highly filtered Moderately filtered
Poll ordering Locality-optimized Fairness-oriented
Partial poll support Tolerated Less common
Race recovery Immediate retry Deferred detection
Timeout precision Tick-based, conservative Higher resolution
select compatibility Strong legacy fidelity Version-dependent

12. Guidance for emulation or re-implementation

To mimic IRIX poll/select behavior:

  • Use object-centric readiness tracking
  • Avoid broadcast wakeups
  • Preserve per-event filtering
  • Support conservative timeout rounding
  • Clean up readiness state on interruption
  • Preserve historical select(2) error behavior

Attempting to directly transplant Solaris semantics without accounting for these differences will result in observable behavioral drift.

13. Final note

IRIX poll/select behavior is best understood not as a variant of Solaris, but as a parallel evolution from the same SVR4 roots, shaped by IRIX’s scalability goals and long hardware lifespan.

This behavior is intentional, consistent, and relied upon by real software.