Summer School 2024 | Oxford LLMs

In late September 2024, Nuffield College, supported by the Oxford Van Houten Fund, ran an intensive five-day workshop on AI for social science. Early-career researchers attended lectures on large language models, completed hands-on Python tutorials, and collaborated on research projects with AI experts.

We received a little short of 200 applications and selected 30 participants.

Workshop Themes and Materials

Our lecture series, developed by Grigory Sapunov (Intento), Tatiana Shavrina (Meta), and Ilya Boytsov (Wayfair), covered LLM foundations, transformer architecture, in-context learning, and emergent reasoning. We introduced advanced techniques such as agent-based systems, fine-tuning methods, and self-hosting considerations.

Applied tutorials demonstrated retrieval-augmented generation for social science data and observability tools for model monitoring. All lecture slides, notebooks, and code samples are collected on the Summer School Materials page and on GitHub.

Established scholars from political science, sociology, and history – Lisa P. Argyle, Chris Barrie, Raymond Duch, Thomas Hegghammer, Neil Ketchley, Alison Koh, and Alexis Palmer – shared examples of research using LLMs from their work. Guest presenters from companies like Qdrant, Ori Cloud, Arize, and Google led practical seminars, with contributions from Atita Arora, John Githuly, Christian Silva, and Ciera Fowler.

Collaborative Research Projects

All participants contributed to the main project: developing an approach for survey data preprocessing aimed at training LLMs to predict people’s opinions and self-reported behaviors. Ori Cloud provided GPU support for this project. The models and datasets are published on the Oxford LLMs HuggingFace profile, and the group is polishing the work before submission.

Looking Ahead

Feedback highlighted that the collaborative project was the workshop’s most valuable component and that participants wanted more coding exercises. As a result, the September 2025 workshop will be slightly more technical, with increased focus on coding sessions and research projects.

If you are an early-career researcher eager to master LLM architectures, coding workflows, and participate in interdisciplinary research projects, apply for the 2025 workshop!

Organisers

Humeyra Biricik (Co-Organiser) DPIR Oxford University Portrait of Humeyra Biricik

Humeyra Biricik is co-organiser of Oxford LLMs 2024 and a doctoral candidate in Politics at Pembroke College. Her research focuses on political speech, populism, and democratic backsliding in Turkey, Hungary, India, and Arabic-speaking Middle Eastern countries, using large language models and text analysis alongside econometric methods. She also coordinates a series of politics talks at Pembroke College on topics ranging from local British elections and the housing crisis to AI regulation and machine learning in political science.

Ilya Boytsov (Lecturer, Co-Organiser) NLP Lead, Wayfair Portrait of Ilya Boytsov

Ilya is an applied Deep Learning Scientist with a focus on Natural Language Processing (NLP). He is the NLP lead at Wayfair in Berlin and has extensive experience designing machine-learning bootcamps and lectures for diverse audiences. He has spoken at conferences including the World Data Summit and DSC Europe, and co-founded the Street Smart AI community in Berlin. More details are available on his personal website.

Maksim Zubok (Co-Organiser) DPIR Oxford University Portrait of Maksim Zubok

Maksim is a doctoral candidate in Politics at Oxford University, Nuffield College. His research explores how large language models can be used for social science, from standard data labelling tasks to using models as snapshots of the internet to study how people structure concepts and beliefs. He has helped organise multiple academic events, including the Oxford Summer Institute for Computational Social Science.

Lecture and Seminar Leaders

Grigory Sapunov (Lecturer) CTO and co-founder of Intento Portrait of Grigory Sapunov

Grigory Sapunov is CTO and co-founder of Intento. With over 20 years of software engineering experience and around 15 years working in data analysis, AI, and machine learning, he has been engaged in deep learning since 2011. He is a Google Developer Expert in Machine Learning and holds a Ph.D. in Artificial Intelligence. You can connect with him on LinkedIn.

Tatiana Shavrina (Lecturer) Research Scientist Manager, Meta, Llama team Portrait of Tatiana Shavrina

Tatiana Shavrina works on the Llama team at Meta and has previously worked at Snap and AIRI. She focuses on multilingualism and under-resourced languages in large language models and has contributed to projects such as BLOOM, mGPT, and Russian SuperGLUE. She also works on benchmarking and evaluation methods for LLMs. See her Google Scholar profile for more details.

Atita Arora (Seminar) Solution Architect, Qdrant Portrait of Atita Arora

Atita Arora is a solution architect and relevance strategist with over 15 years of experience in information retrieval. She has contributed to multiple open-source projects and is currently writing a book on vector databases. Her seminar focuses on retrieval augmented generation (RAG), covering end-to-end implementation, experimentation, and evaluation, and how to identify where in the RAG pipeline improvements can be made.

Ciera Fowler (Seminar) ML Engineering Lead, Ori Cloud Portrait of Ciera Fowler

Ciera Fowler is ML Engineering Lead at Ori, an AI-native GPU cloud provider, and an MBA student at London Business School. She works on benchmarking and analysing LLMs and regularly gives talks and tutorials on building LLM-powered agents and applications. At LBS, she is active in student leadership through the Black in Business Club and the Technology & Media club.

John Gilhuly (Seminar) Developer Advocate, Arize AI Portrait of John Gilhuly

John Gilhuly is a developer advocate at Arize AI focused on open-source LLM observability and evaluation tooling. His seminar covers core principles of LLM observability, including tracing and OpenTelemetry, and compares different evaluation strategies such as LLM-as-a-judge and assertion-based methods, illustrated via a hands-on walkthrough of detecting bias and misinformation in an LLM-based research agent.

Christian Silva (Seminar) AI/ML Customer Engineer, Google Cloud Portrait of Christian Silva

Christian Silva is an AI/ML Customer Engineer at Google Cloud with more than 15 years of experience in analytics, data management, and machine learning. Working across sectors such as financial services and healthcare, he focuses on helping organisations apply AI and data for decision-making. His seminar introduces practical approaches to deploying and governing AI solutions in production environments.

Research Talk Speakers

Lisa P. Argyle Assistant Professor of Political Science, Brigham Young University Portrait of Lisa P. Argyle

Lisa P. Argyle is an Assistant Professor of Political Science at Brigham Young University and a Faculty Fellow at the Center for the Study of Elections and Democracy. Her research blends political psychology with computational social science to study political attitudes and participation, with a recent focus on generative AI since the release of GPT-3. She uses surveys, experiments, and AI tools to understand how people talk about politics in their everyday lives. More information is available on her website.

Chris Barrie Assistant Professor in Sociology, New York University Portrait of Chris Barrie

Chris Barrie is Assistant Professor in Sociology at New York University. His research focuses on political sociology, especially conflict, communication, and political attitudes, using NLP and digital trace data. He founded the Social Data Science Hub at the University of Edinburgh. Learn more on his website.

Raymond Duch Director of the Centre for Experimental Social Sciences, Nuffield College, Oxford University Portrait of Raymond Duch

Ray Duch is Director of the Centre for Experimental Social Sciences (CESS) and co-Director of the Candour Project and the REAL Demand Centre. His work uses experiments to study decision making across politics, finance, health, and economics and has appeared in leading journals such as the American Political Science Review, Journal of Politics, and Nature Medicine. More information is available on his website.

Thomas Hegghammer Senior Fellow in Politics, All Souls College, Oxford University Portrait of Thomas Hegghammer

Thomas Hegghammer is Senior Fellow in Politics at All Souls College, Oxford. He is a political scientist and historian specialising in political violence in the Muslim world, especially transnational jihadi groups. His books include The Caravan: Abdallah Azzam and the Rise of Global Jihad and Jihadi Culture: The Art and Social Practices of Militant Islamists. More details are available on his website.

Neil Ketchley Associate Professor in Politics, St Antony's College, Oxford University Portrait of Neil Ketchley

Neil Ketchley is Associate Professor in Politics and Fellow of St Antony's College. He is a political scientist of the Arabic-speaking Middle East and North Africa whose work focuses on political sociology and comparative politics. His book Egypt in a Time of Revolution won the Charles Tilly Distinguished Contribution to Scholarship Award. Learn more on his profile.

Alison Koh Research Fellow in NLP, University of Birmingham’s Centre for Artificial Intelligence in Government Portrait of Alison Koh

Allison Koh is a Research Fellow in Natural Language Processing at the University of Birmingham’s Centre for Artificial Intelligence in Government. Her research lies at the intersection of international relations, political communication, and computational social science, with a focus on geopolitics of emerging technologies and generative AI in conflict research. Read more on her profile.

Alexis Palmer Neukom Fellow, Dartmouth College Portrait of Alexis Palmer

Alexis Palmer recently completed her PhD in Politics at New York University and is starting as a Neukom Fellow at Dartmouth College. Her research focuses on institutional trust, storytelling, and text-as-data methods, using large language models and other tools to study how people talk about politics and how narratives shape perceptions. More information is available on her website.