[PDF][PDF] Language models are unsupervised multitask learners

A Radford, J Wu, R Child, D Luan…�- OpenAI�…, 2019 - insightcivic.s3.us-east-1.amazonaws�…
Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised�…

Rephrasing the web: A recipe for compute and data-efficient language modeling

P Maini, S Seto, H Bai, D Grangier, Y Zhang…�- arXiv preprint arXiv�…, 2024 - arxiv.org
Large language models are trained on massive scrapes of the web, which are often
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such�…

Retrieval augmented language model pre-training

K Guu, K Lee, Z Tung, P Pasupat…�- …�on machine learning, 2020 - proceedings.mlr.press
Abstract Language model pre-training has been shown to capture a surprising amount of
world knowledge, crucial for NLP tasks such as question answering. However, this�…

How context affects language models' factual predictions

F Petroni, P Lewis, A Piktus, T Rockt�schel…�- arXiv preprint arXiv�…, 2020 - arxiv.org
When pre-trained on large unsupervised textual corpora, language models are able to store
and retrieve factual knowledge to some extent, making it possible to use them directly for�…

Toolformer: Language models can teach themselves to use tools

T Schick, J Dwivedi-Yu, R Dess�…�- Advances in�…, 2024 - proceedings.neurips.cc
Abstract Language models (LMs) exhibit remarkable abilities to solve new tasks from just a
few examples or textual instructions, especially at scale. They also, paradoxically, struggle�…

Synthetic data augmentation for zero-shot cross-lingual question answering

A Riabi, T Scialom, R Keraron, B Sagot…�- arXiv preprint arXiv�…, 2020 - arxiv.org
Coupled with the availability of large scale datasets, deep learning architectures have
enabled rapid progress on the Question Answering task. However, most of those datasets�…

The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow…�- Advances in�…, 2023 - proceedings.neurips.cc
Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical�…

Zero-shot cross-lingual transfer with meta learning

F Nooralahzadeh, G Bekoulis, J Bjerva…�- arXiv preprint arXiv�…, 2020 - arxiv.org
Learning what to share between tasks has been a topic of great importance recently, as
strategic sharing of knowledge has been shown to improve downstream task performance�…

Exploring the limits of transfer learning with a unified text-to-text transformer

C Raffel, N Shazeer, A Roberts, K Lee, S Narang…�- Journal of machine�…, 2020 - jmlr.org
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language�…

English intermediate-task training improves zero-shot cross-lingual transfer too

J Phang, I Calixto, PM Htut, Y Pruksachatkun…�- arXiv preprint arXiv�…, 2020 - arxiv.org
Intermediate-task training---fine-tuning a pretrained model on an intermediate task before
fine-tuning again on the target task---often improves model performance substantially on�…