1. Introduction

A comprehensive guide to dataset design, annotation, and task formulation for building reliable and responsible language AI systems.

Welcome to the dataset design and annotation playbook!

This playbook will help you plan and develop training and evaluation datasets, define annotation schemas, and design AI tasks across different languages, domains, and modalities. It provides guidance on dataset structuring, labeling strategies, and ethical considerations for language technologies.

Who is this playbook for?

This playbook is designed for:

Researchers working on NLP dataset creation and evaluation
Annotation teams developing labeled datasets
Project managers overseeing data collection and annotation workflows
AI practitioners designing and evaluating language models
Students and academics studying dataset design and annotation
Multilingual communities contributing to language resources

What will you learn?

By the end of this playbook, you will understand:

How to define the purpose and scope of a dataset
Differences between training and evaluation datasets
Trade-offs between scale and quality
How to design label schemas and ontologies
Approaches for multi-label, single-label, and structured outputs
How to handle ambiguity, edge cases, and annotation boundaries
Best practices for multilingual and cross-lingual dataset design
Ethical considerations, risks, and limitations in dataset creation

How to use this playbook

Each section of this playbook contains:

Clear explanations of dataset design principles
Structured guidance for task and schema definition
Examples and edge cases to support annotation decisions
Practical recommendations for dataset creation workflows
Ethical considerations to guide responsible use

Getting Started

Ready to begin? Start with our foundational sections:

Purpose of this Playbook – Understand target users, scope, and intended use
How to Use This Playbook – Learn how to navigate chapters and contribute
Dataset Types and Design Goals – Explore dataset categories and trade-offs
Task and Schema Definition – Define tasks, labels, and annotation structures
Glossary and Terminology – Learn key concepts and definitions

Purpose of this playbook

Target users and communities
Languages, domains, and modalities covered
Intended use and risks

Dataset Types and Design Goals

Training vs evaluation datasets
General-purpose vs domain-specific datasets
Scale vs quality trade-offs
Monolingual, multilingual, cross-lingual setups

Task and Schema Definition

Task formulation (classification, generation, alignment, retrieval)
Label schema and ontology design
Multi-label vs single-label vs structured outputs
Ambiguity, edge cases, and annotation boundaries

Glossary and Terminology

A reference section providing clear definitions of key terms used throughout the playbook.

Cite this page

Welcome to the dataset design and annotation playbook!​

Who is this playbook for?​

What will you learn?​

How to use this playbook​

Getting Started​

Purpose of this playbook​

Dataset Types and Design Goals​

Task and Schema Definition​

Glossary and Terminology​