Kernel: vnode and vfs
From TechPubs Wiki
IRIX’s VFS (Virtual File System) and vnode architecture is conceptually similar to many UNIX-derived systems (BSD, System V, Solaris), but with some IRIX-specific behavior layering and STREAMS integration. The vnode system provides a uniform abstraction for all filesystem objects, enabling the kernel to operate generically on files, directories, FIFOs, and devices without knowing the underlying filesystem implementation.
1. Vnode: Abstract File Object
- Purpose: Represents an in-memory file or pseudo-file object. Every file, device, pipe, or FIFO that a process can access has a corresponding vnode.
- Key Responsibilities:
- Maintain reference counts (
v_count) for lifecycle management. - Track the type of file (
v_type), device association (v_rdev), and filesystem association (v_vfsp). - Serve as a link to behavior-specific operations (via
v_bh). - Integrate with streams (
v_stream) if the object is a pipe, TTY, or socket. - Enable VM caching, delayed writes, and buffer trees for efficient I/O.
- Maintain reference counts (
- Comparison to Solaris: IRIX vnodes are conceptually similar to Solaris
vnode_t. Both provide:- Reference counting
- Filesystem abstraction
- A behavior or operations layer
- VM caching integration Differences: IRIX exposes
v_buf/v_dpagesdirectly for buffer/page management; Solaris tends to isolate VM caching in thevnode/vfsinterface viav_cacheand VM layers.
2. Behavior Layer (BHV)
- Purpose: Allows vnodes to stack filesystem-specific operations on top of a generic vnode.
- Mechanism:
- Each vnode has a behavior descriptor list (
v_bh). - Filesystems (pipefs, efs, xfs, fifofs) insert their operations at initialization.
- Kernel vnode calls (
VOP_READ,VOP_WRITE,VOP_CLOSE) traverse the behavior chain to find the appropriate function.
- Each vnode has a behavior descriptor list (
- Comparison to Solaris: IRIX’s BHV is functionally similar to Solaris’ VOP vector and shadow vnode mechanism in layered filesystems, but IRIX’s BHV model explicitly allows multiple behaviors per vnode (stacked operations).
3. VFS: Virtual Filesystem Layer
- Purpose: Represents a mounted filesystem in-memory, providing a global interface for:
- File lookup
- Attribute management
- Mount/unmount operations
- Import/export for NFS or network filesystems
- VFS Operations (
vfsops_t):mount/umount: Mount/unmount filesystemsync: Flush dirty datastatvfs: Retrieve filesystem statisticsvget: Lookup vnode by inode within the filesystemquotactl: Manage quotas (if supported)import: Handle import of remote filesystem dataroot: Provide vnode for root of FSreclaim: Clean up filesystem-specific vnode state
Note: In IRIX pipefs, most of these are
fs_nosysor dummy functions because pipefs is a pseudo-filesystem without persistent storage.
- Comparison to Solaris:
- Very similar to Solaris
vfsops_t. - Both systems separate filesystem-specific behavior (
vnodeops_t) from filesystem-wide operations (vfsops_t). - IRIX adds
vfs_insertbhvto bind a VFS to its BHV layer, enabling layered filesystems.
- Very similar to Solaris
4. Vnode Operations (vnodeops_t)
- Purpose: Defines the operations you can perform on a vnode, such as read, write, ioctl, getattr, setattr, seek, poll, and more.
- IRIX Implementation Pattern:
- Functions take a behavior descriptor (
bhv_desc_t) instead of the vnode directly. - For pseudo-filesystems like pipefs, the operations implement custom logic (e.g.,
pipe_read,pipe_write,pipe_poll). - Many operations are stubbed to
fs_nosysorfs_noerrif not meaningful (e.g.,vop_createin pipefs).
- Functions take a behavior descriptor (
- Categories of operations:
- File I/O:
read,write,ioctl,fsync,fcntl - Attributes:
getattr,setattr,access,pathconf,attr_get/set - Directory/Namespace:
lookup,create,remove,mkdir,rmdir,readdir,symlink,readlink - Locking/Mapping:
rwlock,rwunlock,map,addmap,delmap,frlock - Streams support:
strgetmsg,strputmsg - Polling/Selection:
poll,selectintegration viapollhead - Cleanup:
inactive,reclaim,realvp,cover,link_removed
- File I/O:
- Comparison to Solaris:
- IRIX
vnodeops_tmaps almost one-to-one to Solaris VOP functions. - IRIX uses
bhv_desc_tfor behavior dispatch; Solaris uses directvnode_t*and VOP macros. pipefsshows how pseudo-filesystems are inserted into the BHV chain.
- IRIX
5. Lifecycle Management
- Reference counting (
v_count) is used by:- VFS, FS operations, streams, and user-level file descriptors.
- Inactive/Teardown:
vnode_inactiveis called when the last reference is released.- Pseudo-filesystems clean up buffers, semaphores, SVs, and polling structures.
- Reclaim:
vop_reclaim(orfs_noerr) allows the FS to free internal structures before the vnode memory itself is freed.
- Comparison to Solaris:
- IRIX inactive/reclaim closely mirrors Solaris, where
VN_RELEtriggersVOP_INACTIVEand may later callVOP_RECLAIM.
- IRIX inactive/reclaim closely mirrors Solaris, where
6. Integration with Poll/Select
- Each vnode supporting streams or pseudo-filesystems exposes a
polloperation (pipe_pollexample). - Vnode maintains pollheads (
pi_rpq,pi_wpq) to manage sleeping readers/writers. - Poll/select logic interacts directly with vnodes’ wait queues, using semaphores and SVs to wake sleeping processes.
- IRIX integrates select/poll tightly with its behavior layer: the
vnodeopsdispatch ensures the correct filesystem or pseudo-device handles events.
7. Streams and Pseudo-Filesystems
- Vnodes can represent STREAMS objects (
v_stream), which include pipes, ttys, sockets. - Pseudo-filesystems like pipefs, fifofs, or STREAMS-based devices rely on vnodes for identity but do not implement persistent storage.
- Pipefs shows:
- Behavior insertion (
pipe_bhv) - Read/write/poll implemented at the vnode level
- Attribute, locking, and inactive handling integrated with BHV and semaphores
- Behavior insertion (
8. Key Semantic Takeaways vs. Solaris/OpenSolaris
| Feature | IRIX | Solaris / OpenSolaris | Notes |
|---|---|---|---|
| Vnode abstraction | Generic file object | Generic file object | Conceptually identical |
| Behavior Layer | v_bh, multiple behaviors
|
VOP vector, shadow vnode | IRIX allows stacking; Solaris uses layered vnode/shadow vnodes |
| VFS interface | vfsops_t
|
vfsops_t
|
Almost identical; mount/unmount, sync, statvfs, vget |
| Vnode ops | vnodeops_t
|
vnodeops_t
|
Nearly identical; both separate FS operations from global FS ops |
| Streams integration | v_stream
|
v_stream (STREAMS)
|
Very similar; IRIX integrates tightly with select/poll via SVs/semaphores |
| Page/Buffer caching | v_dpages, v_buf, v_pc
|
Solaris uses v_cache, v_pages
|
IRIX exposes more VM/buffer fields directly |
| Poll/select | pollhead, sv_wait_sig
|
pollhead + selwait
|
Both implement readiness/wakeup mechanisms; IRIX uses semaphores/SV directly |
9. Summary
IRIX vnode/VFS system is:
- Highly modular: behavior layers allow multiple filesystems and pseudo-filesystems to stack operations.
- Stream-aware: supports pipes, ttys, FIFOs, and sockets in a unified way.
- Poll/select integrated: readiness queues and SVs are part of each vnode.
- VM-aware: page cache, dirty pages, buffers, and delwri structures are exposed.
- Reference-counted: ensures proper lifecycle management via
inactive/reclaim. - OpenSolaris-like: almost all concepts map directly to Solaris VFS/vnode system.