Libc: doscan
The _doscan function is the core implementation of formatted input in IRIX libc, serving as the internal routine for the entire scanf family (scanf, fscanf, sscanf, vscanf, etc.). It parses the format string, reads from a FILE * stream (or string via dummy stream for sscanf), matches input against specifiers, skips whitespace as required, and stores converted values via va_list. This decompiled version shows MIPS-specific handling (e.g., conditional long double support in 32-bit ABI), full POSIX positional parameter support, wide-character conversions (%C/%S), and careful stream handling with macros like locgetc/locungetc. It uses a dynamic scratch buffer for numeric parsing, character class tables for %[], and multibyte-to-wide conversion for wide specifiers.
Key Functions
The implementation features a main parsing loop with helper routines for different conversion types.
Main Entry Point (_doscan)
_doscan validates the stream for readability, initializes positional argument handling if needed, and loops through the format:
Skips whitespace in format and input. Handles literal '%' (%%) matching. Parses suppression (*), field width, positional ($), length modifiers (h,l,ll,L). Dispatches to conversion-specific handlers. Returns the number of successful assignments or EOF on error/end-of-input with no matches.
Format Parsing and Positional Parameters
Uses charswitch for incremental parsing. Full support for POSIX %n$ positional parameters with fast path via _mkarglst (pre-scans for first 30 args) and fallback va_arg skipping. Same stva_list wrapper as in _doprnt.
String and Character Handling (c, s, [, C, S)
string(): Reads characters until whitespace (for %s) or mismatch (for %c/%[). Respects field width; no null-termination for %c. wstring(): Wide version using nextwc() for multibyte-to-wide conversion. setup(): Builds character class table for %[] (handles ^ negation, ranges like a-z).
Integer and Pointer Conversion (d, i, o, u, x, p)
Handled in number(): Collects digits into dynamic scratch buffer, detects base (including 0x auto for %i/%x), handles sign. Uses strtol-like accumulation with careful overflow avoidance. %p treated as hex long.
Floating-Point Conversion (e, E, f, F, g, G)
Collects full numeric string (sign, digits, decimal, exponent) into scratch buffer, then calls atof/atold for conversion. Handles locale decimal point (_numeric[0]). Long double conditional on ABI.
Assignment Count (%n)
Stores characters consumed so far (with h/l/ll modifiers).
Auxiliary and Internal Utilities
locgetc/locungetc: Stream-aware get/unget (handles sscanf dummy streams). readchar(): Direct read() fallback when buffer empty (non-sscanf). nextwc(): Multibyte-to-wide conversion with mbtowc. Dynamic scratch buffer (initial 128 bytes, grows by 128) for numeric strings. _mkarglst: Pre-scans format to build positional va_list array (assumes uniform pointer size).
Undocumented or IRIX-Specific Interfaces and Behaviors
Critical Structures for Compatibility
stva_list: Same wrapper as in _doprnt for positional args. No special FP union needed (unlike output).
Extended Format Specifiers
%C / %S: Wide character/string (uppercase, similar to glibc %lc/%ls). Full POSIX positional parameters (%n$).
Length Modifier Handling
In 32-bit ABI, 'L' for floating-point defaults to 'l' (no quad FP support). Uppercase specifiers (E,G,X) converted to lowercase (except C/S); non-floating uppercase treated as 'l'.
Stream and Locale Integration
Uses _numeric[0] for decimal point in floats. Multibyte-aware wide conversions. Robust EOF handling and character count tracking.
These extend standard scanf with wide support, positional args, and careful sscanf integration while maintaining System V-style internals.
Similarities to illumos and BSD libc Implementations
To aid reimplementation using modified illumos or BSD sources, note the strong System V heritage shared with illumos.
illumos libc (__doscan_u / doscan.c)
Very close match (IRIX likely derived from earlier SVR4):
Core function often named _doscan or __doscan_u. Identical macros locgetc/locungetc for sscanf handling. Dynamic scratch buffer for numbers. Character class table (NCHARS) for %[]. Positional parameter support with similar fast/slow paths. Wide handling separate. Floating uses atof/strtod family. Minor differences: illumos may use strtoll/strtoull directly; locale handling evolved.
For reimplementation: Start with illumos doscan.c (or Solaris-derived forks), add IRIX-specific wide handling (%C/%S via nextwc), match exact positional logic and scratch buffer growth. ABI conditionals for long double easy to adapt.
BSD libc (vfscanf.c / __svfscanf)
More divergent (BSD heritage):
Core is vfscanf or __svfscanf; no _doscan. State-machine parsing with explicit conversion types (CT_INT, CT_FLOAT, etc.). Fixed-size buffer for floats (often 513); no dynamic scratch. No native positional parameters (%n$). Advanced locale/xlocale support. Floating via custom parsefloat (not atof). Wide via %lc/%ls (not %C/%S).
For reimplementation: BSD code is portable and modern but requires adding positional support (port from illumos/IRIX), changing to dynamic buffer, adapting wide specifiers. Less direct match than illumos. Overall, illumos/Solaris-derived _doscan is the closest base for porting; combine with IRIX specifics for wide/positional/ABI handling. Verify against sources (e.g., illumos-gate doscan.c equivalents, FreeBSD vfscanf.c).