What You Will Learn

Module 1

Foundations

From bag-of-words to embeddings and tokenization, then contextual representations, language modeling, the Transformer, scaling laws, and how this stack shows up in social science workflows—the conceptual spine for the rest of the course.

Module 2

From Models to Tools

Post-training alignment (SFT, RLHF, DPO), prompting from zero-shot to chain-of-thought, reasoning techniques, and how to evaluate models and choose one that fits your task.

Module 3

Deploying for Research

Fine-tuning (including LoRA-style adaptation and encoder heads), moving from notebooks to APIs and self-hosted inference, and rigorous validation—so a classifier works as a measurement instrument you can defend in publication.

Module 4

Extraction, Summarization, and RAG

Faithful extraction and summarization (hallucination, omission, and distortion), then retrieval-augmented generation: chunking, embeddings, retrieval, and grounded answers with provenance checks over your own corpora.

Module 5

Agentic Workflows

Building autonomous research agents with tool use, ReAct patterns, and multi-step orchestration: the frontier of what LLMs can do for your research pipeline.

Prerequisites

Beginner-to-intermediate Python. A Google account for Colab. No prior deep learning or NLP experience required; we build from fundamentals.

Preliminary notebooks →

Materials

All exercises are open-source Jupyter notebooks. Clone the repo, open in Colab, and follow along.

GitHub Repository →

Begin the Course

Start with Module 1: the mathematical and conceptual foundations that everything else builds on.

Begin →

2026 Dates & Locations

Upcoming

Oxford

March 23–27, 2026
5-day intensive. DPIR, University of Oxford.

ESSCA Paris

April 7, 2026
1-day workshop.

EUI Florence

April 20–21, 2026
2-day workshop. European University Institute.