Google Scholar

[PDF][PDF] Language models are unsupervised multitask learners

A Radford, J Wu, R Child, D Luan…�- OpenAI�…, 2019 - insightcivic.s3.us-east-1.amazonaws�…

Natural language processing tasks, such as question answering, machine translation,
reading comprehension, and summarization, are typically approached with supervised�…

Save Cite Cited by 12125 Related articles All 31 versions View as HTML

[PDF] arxiv.org

Rephrasing the web: A recipe for compute and data-efficient language modeling

P Maini, S Seto, H Bai, D Grangier, Y Zhang…�- arXiv preprint arXiv�…, 2024 - arxiv.org

Large language models are trained on massive scrapes of the web, which are often
unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such�…

Save Cite Cited by 11 Related articles All 4 versions View as HTML

[PDF] mlr.press

Retrieval augmented language model pre-training

K Guu, K Lee, Z Tung, P Pasupat…�- …�on machine learning, 2020 - proceedings.mlr.press

Abstract Language model pre-training has been shown to capture a surprising amount of
world knowledge, crucial for NLP tasks such as question answering. However, this�…

Save Cite Cited by 1577 Related articles All 9 versions View as HTML

[PDF] arxiv.org

How context affects language models' factual predictions

F Petroni, P Lewis, A Piktus, T Rockt�schel…�- arXiv preprint arXiv�…, 2020 - arxiv.org

When pre-trained on large unsupervised textual corpora, language models are able to store
and retrieve factual knowledge to some extent, making it possible to use them directly for�…

Save Cite Cited by 199 Related articles All 7 versions View as HTML

[PDF] neurips.cc

Toolformer: Language models can teach themselves to use tools

T Schick, J Dwivedi-Yu, R Dess�…�- Advances in�…, 2024 - proceedings.neurips.cc

Abstract Language models (LMs) exhibit remarkable abilities to solve new tasks from just a
few examples or textual instructions, especially at scale. They also, paradoxically, struggle�…

Save Cite Cited by 931 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Synthetic data augmentation for zero-shot cross-lingual question answering

A Riabi, T Scialom, R Keraron, B Sagot…�- arXiv preprint arXiv�…, 2020 - arxiv.org

Coupled with the availability of large scale datasets, deep learning architectures have
enabled rapid progress on the Question Answering task. However, most of those datasets�…

Save Cite Cited by 45 Related articles All 11 versions View as HTML

[PDF] neurips.cc

The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow…�- Advances in�…, 2023 - proceedings.neurips.cc

Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical�…

Save Cite Cited by 25 Related articles All 4 versions View as HTML

[PDF] arxiv.org

Zero-shot cross-lingual transfer with meta learning

F Nooralahzadeh, G Bekoulis, J Bjerva…�- arXiv preprint arXiv�…, 2020 - arxiv.org

Learning what to share between tasks has been a topic of great importance recently, as
strategic sharing of knowledge has been shown to improve downstream task performance�…

Save Cite Cited by 114 Related articles All 11 versions View as HTML

[PDF] jmlr.org

Exploring the limits of transfer learning with a unified text-to-text transformer

C Raffel, N Shazeer, A Roberts, K Lee, S Narang…�- Journal of machine�…, 2020 - jmlr.org

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-
tuned on a downstream task, has emerged as a powerful technique in natural language�…

Save Cite Cited by 16588 Related articles All 15 versions View as HTML

[PDF] arxiv.org

English intermediate-task training improves zero-shot cross-lingual transfer too

J Phang, I Calixto, PM Htut, Y Pruksachatkun…�- arXiv preprint arXiv�…, 2020 - arxiv.org

Intermediate-task training---fine-tuning a pretrained model on an intermediate task before
fine-tuning again on the target task---often improves model performance substantially on�…

Save Cite Cited by 69 Related articles All 6 versions View as HTML

Cite

Advanced search

Saved to My library

[PDF][PDF] Language models are unsupervised multitask learners

Rephrasing the web: A recipe for compute and data-efficient language modeling

Retrieval augmented language model pre-training

How context affects language models' factual predictions

Toolformer: Language models can teach themselves to use tools

Synthetic data augmentation for zero-shot cross-lingual question answering

The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only

Zero-shot cross-lingual transfer with meta learning

Exploring the limits of transfer learning with a unified text-to-text transformer

English intermediate-task training improves zero-shot cross-lingual transfer too