Libc: doprnt

From TechPubs Wiki

Revision as of 18:40, 30 December 2025 by Raion (talk | contribs) (Initial Commit)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

The _doprnt function forms the core of the formatted output routines in IRIX libc, serving as the internal implementation for the entire printf family (printf, fprintf, sprintf, vprintf, etc.). It parses the format string, retrieves variable arguments via va_list, converts values according to specifiers, and writes formatted output to a FILE * stream or buffer (with dummy streams used for string variants). This decompiled implementation reveals MIPS-specific adaptations for 32-bit and 64-bit ABIs (controlled by _MIPS_SIM macros), support for ANSI C specifiers, POSIX extensions (positional parameters, thousands grouping), and IRIX-specific behaviors (human-readable size formats, detailed NaN printing). It coordinates buffer management, locale-aware grouping, floating-point bit manipulation, and wide-character handling while ensuring efficient output through macros like PUT and PAD.

Key Functions

The implementation consists of a main loop with supporting utilities that handle parsing, conversion, and output. Below is a detailed overview of the primary operations and structures.

Main Entry Point (_doprnt)

_doprnt processes the format string in a loop, copying literal text directly and handling '%' specifiers through flag parsing, width/precision extraction, modifier application, and conversion-specific logic. It maintains output count, manages stream buffering via _dowrite, and constructs output components (prefix, padding, value, suffix) before emission. Special handling exists for dummy streams (sprintf family) and flushing on newline or unbuffered/line-buffered modes.

Format Parsing and Flag Management

Flags, width, precision, and modifiers are parsed incrementally using a charswitch loop. Supports standard flags (+, -, space, #, 0), precision (.), width, length modifiers (h, l, ll, L), positional parameters (n$), and dynamic width/precision (*). The apostrophe (') flag enables thousands grouping via __do_group. Bitmask flagword tracks state for padding, justification, and suffixes.

Integer Conversion (d, i, u, o, x, X, p)

Signed conversions handle negative values with careful overflow avoidance using _lowdigit for minimum integer cases. Digit generation uses table-based division (powers of 10) or bit shifting for non-decimal bases. Optional thousands grouping applies for decimal with ' flag. # flag adds prefixes (0 for octal, 0x/0X for hex). Pointers (%p) are treated as hex with length modifier in 64-bit.

Floating-Point Conversion (e, E, f, F, g, G, a/A absent)

Uses ecvt_r/fcvt_r (or quad variants for long double) to obtain mantissa and exponent. Special detection and formatting for Inf ("inf"/"INF") and NaN ("nan0x<hex>"/"NAN0X<hex>") via _dval bit fields. Supports # flag for trailing decimal, precision defaults, and exponent formatting. Grouping applies to integral part in fixed formats.

Human-Readable Size Formats (b, B – IRIX-specific)

Divides floating value repeatedly by 1024 (%b) or 1000 (%B), appending suffix (k,m,g,t,p,e,z,y or uppercase). Falls through to %f logic for the numeric part.

String and Character Handling (s, S, c, C)

Narrow strings copy with precision limit; NULL renders as "(null)". Wide characters/strings use wctomb conversion. Uppercase specifiers handle wide types.

Auxiliary and Internal Utilities

_dowrite: Flushes buffer when full, handling read-mode quirks. _lowdigit / _lowdigit_l: Safe digit extraction for negative extremes.

decimal_digits_ll / decimal_digits_l: Fast digit count via binary search on power-of-10 tables.

_mkarglst / _getarg: Enable positional parameters with fast path for up to 30 arguments.

__do_group: External locale-aware thousands separator insertion.

Macros

IsZero(x): Checks if a _dval double is zero, ABI-aware.

PUT(p, n): Writes n bytes from p to the buffer/stream, handling overflow via _dowrite.

PAD(s, n): Pads with string s repeated n times, using _dowrite for large n.

SNLEN = 5: Length for NaN strings ("nan0x").

NDIG = 82: Buffer size from ecvt.c.

MAXARGS = 30: Max fast positional arguments.

Constants and Tables

table10_ll[18], table10_l[9]: Powers of 10 for decimal digit counting.

_blanks, _zeroes: Padding strings.

uc_digs, lc_digs: Hex digit tables.

lc_nan = "nan0x", uc_nan = "NAN0X": NaN prefixes (IRIX-specific with "0x" for hex payload).

lc_inf = "inf", uc_inf = "INF": Infinity strings.

wnull_buf: Wide null string for %S NULL handling.

Flags (bit positions)

Defined as bitmasks for format parsing:

LENGTH = 1: 'l' modifier.

FPLUS = 2: '+' flag.

FMINUS = 4: '-' flag.

FBLANK = 8: ' ' flag.

FSHARP = 16: '#' flag.

PADZERO = 32: '0' flag.

DOTSEEN = 64: '.' seen (precision).

SUFFIX = 128: Exponent/suffix needed.

RZERO = 256: Trailing zeros.

LZERO = 512: Leading zeros.

SHORT = 1024: 'h' modifier.

LLONG = 2048: 'll' modifier.

LDOUBLE = 4096: 'L' modifier.

FQUOTE = 0x10000: (apostrophe) flag for thousands grouping (IRIX/POSIX extension).

These are stored in flagword and control output formatting.

Structs and Unions

To ensure ABI compatibility, these must match exactly in any replacement:

_dval: Union for bit-level access to double. Critical for NaN/Inf detection and floating-point manipulation.

typedef union {

   struct {
       unsigned sign :1;
       unsigned exp :11;
       unsigned hi :20;
       unsigned lo :32;
   } fparts;
if (_MIPS_SIM != _MIPS_SIM_ABI32)
   struct {
       unsigned sign :1;
       unsigned long long rest :63;
   } dparts;
else
   struct {
       unsigned sign :1;
       unsigned hi :31;
       unsigned lo :32;
   } dparts;
endif
   struct {
       unsigned hi;
       unsigned lo;
   } fwords;
   double d;

} _dval;


ABI-specific: In 32-bit ABI (_MIPS_SIM_ABI32), double is split into two 32-bit words. In 64-bit, uses 63-bit rest after sign.

Used with IsNANorINF, IsINF, IsNegNAN, GETNaNPC (from nan.h).


stva_list: Wrapper for va_list to allow assignment (bypasses C array restrictions).

typedef struct stva_list {

   va_list ap;

} stva_list;


Used for positional argument handling in _mkarglst and _getarg.

Internal Functions

_lowdigit(long long *valptr), _lowdigit_l(long *valptr): Compute low-order decimal digit for negative numbers near overflow, divide by 10. _dowrite(const char *p, ssize_t n, FILE *iop, unsigned char **ptrptr): Handles buffer flushing/writing when buffer is full. decimal_digits_ll(long long), decimal_digits_l(long): Count decimal digits using binary search on power-of-10 tables. __do_group(char *str, double val, int prec, int ndigs, int strsize, int maxfsig): Inserts thousands separators (locale-aware) for flag. (External, not defined here.) _mkarglst(char *fmt, stva_list args, stva_list arglst[]): Initializes array of va_list for first MAXARGS positional args. _getarg(char *fmt, stva_list *pargs, int argno): Advances va_list for args beyond MAXARGS.

Undocumented or IRIX-Specific Interfaces and Behaviors

Critical Structures for ABI Compatibility

_dval: Union providing bit-field access to double representation. Layout differs by ABI (32-bit splits into two words; 64-bit uses 63-bit mantissa). Essential for NaN/Inf detection and payload extraction.

stva_list: Wrapper around va_list to permit assignment, enabling positional argument array.

Extended Format Specifiers

%b / %B: Human-readable byte sizes with binary (1024) or decimal (1000) scaling and suffixes.

%C / %S: Wide character/string (equivalent to glibc %lc/%ls).

Apostrophe (') flag: Thousands grouping in decimal integer and fixed float output (POSIX, locale-aware).

NaN and Infinity Representation

Infinity: "inf" or "INF". NaN: "nan0x<hex-payload>" or "NAN0X<hex-payload>" – includes significand bits as hex (unique to IRIX/MIPS; most systems omit payload).

Positional Parameters (%n$)

Full POSIX support with fast path (pre-scanned array for first 30 arguments) and fallback scanning for higher positions.

Long Double Handling

Conditional on 64-bit ABI; uses separate quad conversion routines and extended precision in grouping.

Buffer and Locale Integration

Uses _numeric[0] for decimal point. Thousands grouping via external __do_group, supporting locale-specific separators. Robust NULL string handling and wide-character conversion.

These behaviors extend standard printf semantics with IRIX-specific formatting, hardware-aware floating-point representation, and efficient positional argument handling while maintaining ABI-critical structure layouts. For replacement implementations, exact matching of _dval layout and NaN printing behavior is required for binary compatibility.

Similarities to illumos and BSD libc Implementations

To facilitate reimplementation using modified sources from illumos (OpenSolaris descendant) or BSD libc, note the following shared and divergent aspects based on their printf cores (_doprnt in illumos, vfprintf in BSD).

illumos libc (_doprnt)

As a System V derivative like IRIX, illumos shares significant structural similarities:

Core function name and signature (_doprnt(const char *fmt, va_list ap, FILE *iop)). Parsing loop with flagword bitmasks for flags (similar positions for FPLUS, FMINUS, etc.). Macros like PUT/PAD for buffered output, _lowdigit for overflow-safe digit extraction. Integer handling with power-of-10 tables for fast digit count and generation. Floating-point uses ecvt/fcvt (with quad for long double), but NaN/Inf as plain "NaN"/"Inf" without hex payload. Positional parameters (%n$) with similar _mkarglst/_getarg logic and MAXARGS limit. Apostrophe (') flag for grouping via external function. Wide support (%lc/%ls instead of %C/%S). No %b/%B; lacks IRIX-specific human-readable sizes. FP union (_ieee_double) similar but not MIPS-ABI-specific; no _dval equivalent.

For reimplementation: Start with illumos _doprnt, port _dval for MIPS ABI, add hex NaN printing using GETNaNPC, implement %b/%B by modifying %f case, match IRIX flags/extensions. Buffer utils (_dowrite) are comparable.

BSD libc (vfprintf)

BSD (FreeBSD/OpenBSD/NetBSD) uses vfprintf as the core, with differences in organization but shared ANSI logic:

Main loop parses format, handles specifiers in switch; flag bits similar but differently named (PF_ prefixes). Buffered output via __sprint or similar, with padding functions. Integer conversion uses custom loops or strtol; some have bitfield %b (different from IRIX %b). Floating-point via gdtoa or dragon4, NaN as "nan"/"inf" (case-sensitive); no hex payload. No native positional args (BSD extension in some, but not standard). No ' flag for grouping; locale support via LC_NUMERIC but not auto-grouping. Wide support (%lc/%ls); some BSDs have %C as synonym. Extensions like %m (strerror), %D (long), %ll (long long). Long double via %Lf; FP handling portable, no ABI-specific unions like _dval.

For reimplementation: BSD vfprintf is less similar due to different architecture (no _doprnt), but floating-point (Inf/NaN) and integer logic can be adapted. Add positional support from illumos/IRIX, modify NaN to include hex, implement IRIX extensions. Buffer management differs but portable. Overall, illumos is closer for direct porting due to System V heritage; combine with BSD for modern FP precision if needed. Verify against sources (e.g., illumos-gate doprnt.c, FreeBSD vfprintf.c).