ChatTLA+ Dataset
This is the dataset release that goes with the ChatTLA+ paper I submitted to ICSOFT 2026. The paper is currently under double-blind review, so the repo is up under an anonymized name with author info and the sibling training-code repo withheld until the review window closes. Once the camera-ready lands, I’ll point this page at the real repo.
The dataset has two parts. The first is corpus/diamond_sft.jsonl, which is 209 rows of supervised fine-tuning data — each row is an OpenAI-style chat list (developer/user/assistant) where the assistant turn is a TLA+ spec that TLC actually validated. Every row carries semantic metadata: how many distinct states TLC explored, what fraction of declared actions got exercised, whether the spec caught at least one mutation of its invariant, and how many invariants were checked. The _source: opus_subagent tag means an Opus-family model drafted the seed and TLC verified it — that’s a methodology marker, not an author hint.