Datasets

Purpose

Define, curate, and document datasets used to evaluate FerroTeX.

This document is intentionally separate from evaluation-plan.md to keep:

dataset definitions stable
evaluation methodology explicit

Dataset Categories

1) Real-World Projects

Target characteristics:

multi-file structure
diverse package usage
presence of common warnings and occasional errors

Required metadata per project:

source origin and license
build instructions
engine/distribution versions used

2) Synthetic Stress Fixtures

Synthetic fixtures should be generated to isolate failure modes:

deep nesting of \input
parentheses and spaces in filenames
forced wrap scenarios
noise that resembles log tokens

3) Labeled Diagnostic Subset

For correctness scoring, maintain a labeled subset:

log excerpt
expected file
expected line (if applicable)
notes on ambiguity

Redaction Policy

Real-world logs may contain absolute paths and usernames.

Replace absolute paths with workspace-relative placeholders.
Preserve structure needed for parsing (parentheses nesting, line refs).
Document any semantic changes introduced by redaction.