Content Tagging Models: Prototype two
Closed, ResolvedPublic

Description

Goal: prototype two content tagging models. Prototype is an ambiguous term and the different models are at different stages. But the goal is each will have a fully-working language-agnostic model and plan for additional improvements / language-specific tweaks.

These will be:

Event Timeline

Weekly update:

  • Ongoing discussion with MM about the design of a quality model and how it works well / doesn't work well for the knowledge gaps metrics use-case.

Weekly update:

  • No progress this week

Weekly update: talked with Miriam and this work will move more slowly for the rest of the quarter while I focus on some other projects but pick back up in Q3

Weekly update: will be prioritizing in January

Weekly update:

  • Gathered new groundtruth data from Arabic/French/English Wikipedia for article quality to test / improve model.
  • Working with Growth on using some of the preliminary geography data for an edit-a-thon.
  • Updated the geographic (and gender) data snapshot for some other projects (and in doing so, verified that still working well)

Weekly updates:

  • updated quality model features (more features and make sure still could run simultaneously on all wikipedia language editions)
  • continued reimplementation of data pipeline for quality model to support evaluation data from multiple languages

Weekly updates:

  • Made some additional progress that I feel good about so closing out this task to create a new Q3-specific task
  • Switched the quality model to purely using wikitext and not links tables -- this will allow us to apply it to historical wikipedia revisions easily and actually probably speeds up / simplifies the data pipeline because now there is one source of data that is processed once (wikitext) as opposed to many different tables that need to be joined etc.
  • Waiting to hear from Newcomer Pilot folks about geography model

Hello Issac, @diego,

I have the following queries for the model V2-

  1. How are the features (e.g., page length, references, etc.) weighted in the model? Further, have they been computed on the basis of all wikis? or some specific wikis?
  1. How did you set the minimum wiki thresholds? I understand that these are determined empirically, but is it considering all the wikis? or a few selective ones?

-Best,
Paramita

Hey @paramita_das: you can see most of these details in the write-up and attached notebook. Pointers to your specific questions below:

How are the features (e.g., page length, references, etc.) weighted in the model?

The exact weights are in the meta page and predictQuality function in the notebook but they were derived from a groundtruth dataset from English, French, and Arabic articles. I don't think I have that notebook public anywhere at the moment but it's based on a sample of articles from that were rated for quality in the last month with a little bit of balancing so English doesn't dominate the sample.

How did you set the minimum wiki thresholds? I understand that these are determined empirically, but is it considering all the wikis? or a few selective ones?

This was based on eye-balling the data for all wikis. You can see some comments on this where the minimum threholds are set in the notebook and then later on the raw data is under a cell labeled Data to help in setting min thresholds if you want to get a sense of the practical impact of these thresholds.