Reproducibility

Goal

Ensure that reported results and behaviors are reproducible across environments.

Environment Pinning

FerroTeX evaluation should pin:

  • TeX distribution version
  • engine versions
  • OS image / container base image
  • Rust toolchain version
  • Node.js version (extension)

Artifacts

The following artifacts should be published for each evaluation:

  • dataset manifest
  • fixture logs (or scripts to generate them)
  • labeled ground truth subsets (where permissible)
  • benchmark scripts and exact command lines
  • raw result outputs (JSON/CSV)

CI Expectations (target)

  • run unit and golden tests on every PR
  • run benchmarks on demand or nightly
  • store benchmark history

Privacy and Licensing

For real-world logs:

  • ensure licensing permits redistribution
  • redact sensitive file paths and user information
  • maintain a documented redaction process