Research

Methods for LLMs
in social science.

Synthetic data generation, opinion prediction, annotator disagreement, and model evaluation. Code and datasets are open source on GitHub and Hugging Face.

§ 01 · Themes

Research themes

What we work on, and why.

Three intertwined questions sit underneath most of what we publish, and most of what we teach.

Synthetic data

Can LLMs generate respondent panels that pass standard quality checks? When they do not, where do they fail, and can we measure it before we use them?

Opinion prediction

How well do open-weights models recover real survey responses across languages, age cohorts, and political contexts, and what drives the gaps?

Evaluation

Building eval sets that survive contamination, treating annotator disagreement as a measurement rather than a problem, and reporting confidence honestly.

§ 02 · Open

Open by default

Code on GitHub.
Weights on Hugging Face.

Replication code ships with the paper, not six months after. Model checkpoints are released under permissive licences whenever we control the training data.