Kernel: vnode and vfs

From TechPubs Wiki

Revision as of 05:04, 27 December 2025 by Raion (talk | contribs) (Created page with "IRIX’s VFS (Virtual File System) and vnode architecture is conceptually similar to many UNIX-derived systems (BSD, System V, Solaris), but with some IRIX-specific behavior layering and STREAMS integration. The vnode system provides a uniform abstraction for all filesystem objects, enabling the kernel to operate generically on files, directories, FIFOs, and devices without knowing the underlying filesystem implementation. === 1. Vnode: Abstract File Object === * Purpo...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

IRIX’s VFS (Virtual File System) and vnode architecture is conceptually similar to many UNIX-derived systems (BSD, System V, Solaris), but with some IRIX-specific behavior layering and STREAMS integration. The vnode system provides a uniform abstraction for all filesystem objects, enabling the kernel to operate generically on files, directories, FIFOs, and devices without knowing the underlying filesystem implementation.

1. Vnode: Abstract File Object

  • Purpose: Represents an in-memory file or pseudo-file object. Every file, device, pipe, or FIFO that a process can access has a corresponding vnode.
  • Key Responsibilities:
    • Maintain reference counts (v_count) for lifecycle management.
    • Track the type of file (v_type), device association (v_rdev), and filesystem association (v_vfsp).
    • Serve as a link to behavior-specific operations (via v_bh).
    • Integrate with streams (v_stream) if the object is a pipe, TTY, or socket.
    • Enable VM caching, delayed writes, and buffer trees for efficient I/O.
  • Comparison to Solaris: IRIX vnodes are conceptually similar to Solaris vnode_t. Both provide:
    • Reference counting
    • Filesystem abstraction
    • A behavior or operations layer
    • VM caching integration Differences: IRIX exposes v_buf/v_dpages directly for buffer/page management; Solaris tends to isolate VM caching in the vnode/vfs interface via v_cache and VM layers.

2. Behavior Layer (BHV)

  • Purpose: Allows vnodes to stack filesystem-specific operations on top of a generic vnode.
  • Mechanism:
    • Each vnode has a behavior descriptor list (v_bh).
    • Filesystems (pipefs, efs, xfs, fifofs) insert their operations at initialization.
    • Kernel vnode calls (VOP_READ, VOP_WRITE, VOP_CLOSE) traverse the behavior chain to find the appropriate function.
  • Comparison to Solaris: IRIX’s BHV is functionally similar to Solaris’ VOP vector and shadow vnode mechanism in layered filesystems, but IRIX’s BHV model explicitly allows multiple behaviors per vnode (stacked operations).

3. VFS: Virtual Filesystem Layer

  • Purpose: Represents a mounted filesystem in-memory, providing a global interface for:
    • File lookup
    • Attribute management
    • Mount/unmount operations
    • Import/export for NFS or network filesystems
  • VFS Operations (vfsops_t):
    • mount / umount: Mount/unmount filesystem
    • sync: Flush dirty data
    • statvfs: Retrieve filesystem statistics
    • vget: Lookup vnode by inode within the filesystem
    • quotactl: Manage quotas (if supported)
    • import: Handle import of remote filesystem data
    • root: Provide vnode for root of FS
    • reclaim: Clean up filesystem-specific vnode state

Note: In IRIX pipefs, most of these are fs_nosys or dummy functions because pipefs is a pseudo-filesystem without persistent storage.

  • Comparison to Solaris:
    • Very similar to Solaris vfsops_t.
    • Both systems separate filesystem-specific behavior (vnodeops_t) from filesystem-wide operations (vfsops_t).
    • IRIX adds vfs_insertbhv to bind a VFS to its BHV layer, enabling layered filesystems.

4. Vnode Operations (vnodeops_t)

  • Purpose: Defines the operations you can perform on a vnode, such as read, write, ioctl, getattr, setattr, seek, poll, and more.
  • IRIX Implementation Pattern:
    • Functions take a behavior descriptor (bhv_desc_t) instead of the vnode directly.
    • For pseudo-filesystems like pipefs, the operations implement custom logic (e.g., pipe_read, pipe_write, pipe_poll).
    • Many operations are stubbed to fs_nosys or fs_noerr if not meaningful (e.g., vop_create in pipefs).
  • Categories of operations:
    1. File I/O: read, write, ioctl, fsync, fcntl
    2. Attributes: getattr, setattr, access, pathconf, attr_get/set
    3. Directory/Namespace: lookup, create, remove, mkdir, rmdir, readdir, symlink, readlink
    4. Locking/Mapping: rwlock, rwunlock, map, addmap, delmap, frlock
    5. Streams support: strgetmsg, strputmsg
    6. Polling/Selection: poll, select integration via pollhead
    7. Cleanup: inactive, reclaim, realvp, cover, link_removed
  • Comparison to Solaris:
    • IRIX vnodeops_t maps almost one-to-one to Solaris VOP functions.
    • IRIX uses bhv_desc_t for behavior dispatch; Solaris uses direct vnode_t* and VOP macros.
    • pipefs shows how pseudo-filesystems are inserted into the BHV chain.

5. Lifecycle Management

  • Reference counting (v_count) is used by:
    • VFS, FS operations, streams, and user-level file descriptors.
  • Inactive/Teardown:
    • vnode_inactive is called when the last reference is released.
    • Pseudo-filesystems clean up buffers, semaphores, SVs, and polling structures.
  • Reclaim:
    • vop_reclaim (or fs_noerr) allows the FS to free internal structures before the vnode memory itself is freed.
  • Comparison to Solaris:
    • IRIX inactive/reclaim closely mirrors Solaris, where VN_RELE triggers VOP_INACTIVE and may later call VOP_RECLAIM.

6. Integration with Poll/Select

  • Each vnode supporting streams or pseudo-filesystems exposes a poll operation (pipe_poll example).
  • Vnode maintains pollheads (pi_rpq, pi_wpq) to manage sleeping readers/writers.
  • Poll/select logic interacts directly with vnodes’ wait queues, using semaphores and SVs to wake sleeping processes.
  • IRIX integrates select/poll tightly with its behavior layer: the vnodeops dispatch ensures the correct filesystem or pseudo-device handles events.

7. Streams and Pseudo-Filesystems

  • Vnodes can represent STREAMS objects (v_stream), which include pipes, ttys, sockets.
  • Pseudo-filesystems like pipefs, fifofs, or STREAMS-based devices rely on vnodes for identity but do not implement persistent storage.
  • Pipefs shows:
    • Behavior insertion (pipe_bhv)
    • Read/write/poll implemented at the vnode level
    • Attribute, locking, and inactive handling integrated with BHV and semaphores

8. Key Semantic Takeaways vs. Solaris/OpenSolaris

Feature IRIX Solaris / OpenSolaris Notes
Vnode abstraction Generic file object Generic file object Conceptually identical
Behavior Layer v_bh, multiple behaviors VOP vector, shadow vnode IRIX allows stacking; Solaris uses layered vnode/shadow vnodes
VFS interface vfsops_t vfsops_t Almost identical; mount/unmount, sync, statvfs, vget
Vnode ops vnodeops_t vnodeops_t Nearly identical; both separate FS operations from global FS ops
Streams integration v_stream v_stream (STREAMS) Very similar; IRIX integrates tightly with select/poll via SVs/semaphores
Page/Buffer caching v_dpages, v_buf, v_pc Solaris uses v_cache, v_pages IRIX exposes more VM/buffer fields directly
Poll/select pollhead, sv_wait_sig pollhead + selwait Both implement readiness/wakeup mechanisms; IRIX uses semaphores/SV directly

9. Summary

IRIX vnode/VFS system is:

  • Highly modular: behavior layers allow multiple filesystems and pseudo-filesystems to stack operations.
  • Stream-aware: supports pipes, ttys, FIFOs, and sockets in a unified way.
  • Poll/select integrated: readiness queues and SVs are part of each vnode.
  • VM-aware: page cache, dirty pages, buffers, and delwri structures are exposed.
  • Reference-counted: ensures proper lifecycle management via inactive/reclaim.
  • OpenSolaris-like: almost all concepts map directly to Solaris VFS/vnode system.

See also