Sunday, 24 May 2026

Automating reproducibility: three approaches, one problem

Reproducibility checking is a reviewer burden that scales badly — and three pieces this week attack it from different angles. Riehl, Marin, and colleagues' ARA preprint from ETH Zürich and the European Commission¹ formalizes the task as structured reasoning over a paper's directed workflow graph, hitting 61% accuracy on the largest cross-domain benchmark to date. Zhang et al.'s PaperRepro², aimed specifically at social science, takes a two-stage multi-agent approach — separating code execution from result evaluation — and beats prior baselines by 22% on REPRO-Bench. The contrast is worth sitting with: ARA works from the paper document itself; PaperRepro executes the actual reproduction package. Together they sketch the shape of a coming automated review layer.

Nüst and Eglen's CODECHECK paper in F1000Research³ is the human-infrastructure counterpart — a workflow for independent code execution woven into existing review and publication processes, using open tooling. It's the practical plumbing that the agentic systems above eventually need to plug into. Read ARA and PaperRepro for the ML architecture; keep CODECHECK as the reference for what operational integration actually looks like.

Publishing's structural crisis, two registers

Sabel and Larhammar's Stockholm Declaration in Royal Society Open Science¹ is a manifesto: paper mills, AI-fabricated data, and reviewer fatigue have reached a tipping point requiring legislative and institutional intervention. The prose is urgent and coalition-signed. Bergstrom, Rieger, and Schonfeld's Ithaka S+R report² is the cooler-headed structural analysis — mapping identifiers, discovery infrastructure, preservation systems, and where AI is forcing a second digital transformation of the whole stack. The Declaration names the emergency; the Ithaka report names the plumbing. Skim the Declaration for the diagnosis; save the Ithaka report as a reference document for anyone thinking seriously about scholarly infrastructure.

Rust eating the JavaScript toolchain

Agnel Nieves's essay on the 2026 JavaScript toolchain¹ is the clearest single account of a quiet structural shift: Turbopack, Rolldown, Biome, Lightning CSS, and a Zig-to-Rust rewrite of Bun have collapsed the Node-era build graph into a handful of statically linked binaries. The speed numbers (5–30× faster bundlers, 50–100× faster linters) are real, but Nieves's more durable point is about supply-chain security — zero npm postinstall scripts means a dramatically reduced attack surface, which matters right now given the wave of compromised packages he documents. This is exactly the register the software-tooling zone is supposed to surface: not hype, but a structural accounting of where the ecosystem actually landed.

Two NBER papers on expectation formation under noisy data

Brave, Crust, Eusepi, Hobijn, and Şahin's NBER WP 35196¹ asks how to read labor market indicators when the underlying data is imperfect — a methodological question with real policy stakes when BLS revisions routinely shift the picture. Bordalo, Gennaioli, Lopez-de-Silanes, Schröder, Shleifer, and van Rooij's NBER WP 35214² runs an RCT on Dutch households to study how people actually form macroeconomic expectations, finding the usual diagnostic signatures of diagnostic expectations (overreaction, extrapolation). The papers don't cite each other but belong in the same afternoon: one asks what the data says, the other asks what people think the data says.

Craft, type, and the aesthetics of intentionality

Two pieces argue, from different angles, that design is reckoning with what it lost by optimizing for conversion. The unnamed author's essay on Gotham Variable and typographic address¹ uses Monotype's variable-font release as a hook for a longer argument about letterforms as a medium of attention — invoking Japanese and Arabic calligraphic traditions to make the point that type has always been a declaration before it's a communication. Fabio Haag's essay on the craft comeback² takes the wider design view: hand-sketching, real-glass brand filming, and identity systems built from physical found objects as reactions to AI-driven sameness. Both pieces are essayistic rather than instructional — best read as a pair on a slow afternoon.