Threats to Validity
Purpose
Make evaluation limitations explicit and define mitigations.
Internal Validity
- Mislabeling ground truth
- Dataset bias toward specific engines/distributions
Mitigations:
- labeling guidelines
- double-labeling subset and measuring agreement
External Validity
- Results may not generalize to all package ecosystems or custom build systems.
Mitigations:
- diverse corpora
- report coverage and failure cases
Construct Validity
- “Correct file/line mapping” may not fully capture user-perceived usefulness.
Mitigations:
- optional UX study (time-to-fix)
- qualitative analysis of failure cases
Conclusion Validity
- Performance results sensitive to hardware and OS.
Mitigations:
- publish raw benchmark runs
- include environment metadata