[PDF][PDF] Language models are unsupervised multitask learners
Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised�…
reading comprehension, and summarization, are typically approached with supervised�…
Rephrasing the web: A recipe for compute and data-efficient language modeling
Large language models are trained on massive scrapes of the web, which are often
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such�…
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such�…
Retrieval augmented language model pre-training
Abstract Language model pre-training has been shown to capture a surprising amount of
world knowledge, crucial for NLP tasks such as question answering. However, this�…
world knowledge, crucial for NLP tasks such as question answering. However, this�…
How context affects language models' factual predictions
When pre-trained on large unsupervised textual corpora, language models are able to store
and retrieve factual knowledge to some extent, making it possible to use them directly for�…
and retrieve factual knowledge to some extent, making it possible to use them directly for�…
Toolformer: Language models can teach themselves to use tools
Abstract Language models (LMs) exhibit remarkable abilities to solve new tasks from just a
few examples or textual instructions, especially at scale. They also, paradoxically, struggle�…
few examples or textual instructions, especially at scale. They also, paradoxically, struggle�…
Synthetic data augmentation for zero-shot cross-lingual question answering
Coupled with the availability of large scale datasets, deep learning architectures have
enabled rapid progress on the Question Answering task. However, most of those datasets�…
enabled rapid progress on the Question Answering task. However, most of those datasets�…
The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only
Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical�…
curated``high-quality''corpora, such as social media conversations, books, or technical�…
Zero-shot cross-lingual transfer with meta learning
Learning what to share between tasks has been a topic of great importance recently, as
strategic sharing of knowledge has been shown to improve downstream task performance�…
strategic sharing of knowledge has been shown to improve downstream task performance�…
Exploring the limits of transfer learning with a unified text-to-text transformer
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language�…
tuned on a downstream task, has emerged as a powerful technique in natural language�…
English intermediate-task training improves zero-shot cross-lingual transfer too
Intermediate-task training---fine-tuning a pretrained model on an intermediate task before
fine-tuning again on the target task---often improves model performance substantially on�…
fine-tuning again on the target task---often improves model performance substantially on�…