Log Grammar and Parsing Rules
Status
- Type: Normative
- Stability: Draft
Problem Statement
TeX logs are not line-oriented data. They are a byte stream containing multiple interleaved constructs:
- file context nesting via parentheses
- diagnostics blocks
- warnings with variable phrasing
- legacy line wrapping that can split paths and messages
The grammar below defines how FerroTeX interprets this stream.
Normalization
Before tokenization:
- Normalize CRLF to LF.
- Preserve original byte offsets by maintaining a mapping table if normalization changes indices.
- Alternatively, operate on raw bytes and interpret newlines as-is.
Token Classes (informal)
LPAREN=(RPAREN=)BANG=!(error)LINEREF=l.<digits>WARNING_PREFIX= matches known warning introducersLaTeX Warning:Package .* Warning:Overfull \hboxUnderfull \hbox
PROMPT=?at start-of-line in interactive contexts
File Path Recognition
A FileEnter event MUST only be emitted when the parser recognizes a plausible path following LPAREN.
Path Heuristics (MUST/SHOULD)
- MUST accept:
./relative/path.texrelative/path.tex- absolute POSIX paths
- Windows-style paths (drive letters) if running on Windows
- SHOULD accept:
.sty,.cls,.bib,.aux,.tocand other auxiliary extensions
- MUST handle spaces conservatively:
- treat spaces as terminators unless the engine clearly emits quoted paths (rare)
Line Wrapping Handling
Classic TeX wraps output around ~79 characters.
FerroTeX uses guarded joining:
- Join line fragments only when both conditions hold:
- the first line ends in a token that is syntactically incomplete (e.g., path missing extension or closing delimiter)
- the next line begins with characters consistent with a continuation (not a diagnostic boundary)
The join algorithm MUST be bounded (e.g., at most N lines joined for a single token).
Error Blocks
A minimal error block:
BANGfollowed by message line- optionally
LINEREFline - optionally context lines
The parser MUST treat these as a cohesive structure, not independent lines.
Recovery
Recovery is required due to ambiguous or malformed logs.
- If
RPARENoccurs with empty stack, emitInfoand continue. - If
LPARENappears but no plausible path follows, treat as text. - If repeated ambiguity is detected, reduce confidence and emit recovery events.
Output
The parser emits:
- an event stream (see
log-event-ir.md) - file stack updates via
FileEnter/FileExit
The reconstruction layer attaches diagnostics using the current stack top.