RFC: Parsoid Extension API
Closed, ResolvedPublic

Description

  • Affected components: All MediaWiki extensions that use a parser hook or a Parser.php method.
  • Engineer(s) or team for initial implementation: Parsing Team.
  • Code steward: Parsing Team.

Motivation

It is well-known that the Parsing Team is seeking to make Parsoid the default wikitext engine for MediaWiki. Given that Parsoid has an entirely different processing model and implementation and pipeline, Parsoid cannot support the exact parsing API (which turns out to just be whatever methods are public in Parser.php) and hooks that the core parser supports.

Given the above, the parsing team has been at work to define a Parsoid Extension API that extensions can use to hook into Parsoid and support the same functionality that they currently implement with the core parser.

Current Status

At this point, Parsoid has a late-draft proposal for such an API and is being elaborated in detail at mw:Parsoid/Extension_API. This page has been evolving since March 2020 and at this time, in August 2020, we consider this to be in a good enough shape to go through a TechCom RFC process. This extension API has been matched with corresponding implementation updates in Parsoid. All of Parsoid's extensions currently in production on Wikimedia wikis strictly follow this extension API. So, this proposal is not just an on-paper API but is actually a real functioning implementation.

Extensions using this API
  • Gallery, Pre, Nowiki (Core extensions)
  • Cite, Poem
  • ImageMap (almost ready)
  • RawHTML, StyleTag (Parser Tests)

Admittedly, this API doesn't yet capture the full diversity of use cases out there wrt how extensions interact with the core parser. While we hope to get there eventually, we aim to do that in stages.

Step 1a (DONE) : Extract an extension API out of Parsoid's extension implementations and demonstrate proof of concept. As part of this process, we have extensively refactored Parsoid extension implementations to refine the API to be coherent and consistent and not cheat by virtue of being in the Parsoid codebase.
Step 1b (IN PROGESS) : Consult as widely as possible to ensure adequate exposure to the upcoming changes and ensure developers have opportunities to provide feedback. See Exploration section below for a bit more detail.
Step 1c (IN PROGRESS) : Get explicit approval of the core design of Parsoid's Extension API from TechCom. All additional review and feedback is welcome. But, this approval ensures we can proceed with expand the API without having to back to the drawing board around basic ideas and principles
Step 2: Ensure Parsoid Extension API is suitably expanded to capture all the uses cases for extensions deployed on the Wikimedia cluster.
Step 3: Ensure all Wikimedia extensions are "Parsoid-compatible" (or have suitable workarounds to continue functioning when Parsoid replaces the core parser on Wikimedia wikis). The Parsing Team will rely on and expect engineering help from other teams and developers in achieving this goal. But, resolving this is outside the scope of the RFC and I am stating this here for reasons of completeness and providing a fuller picture.
Step 4: Ensure Parsoid Extension API is suitably expanded to capture the broader set of use cases outside Wikimedia wikis.

Requirements

I already covered this in some detail above when I laid out the steps. But, broadly, initially, any extension API that we develop for Parsoid should be able to support current functionality provided by extensions deployed on the Wikimedia cluster. For the longer term, the requirements expand to extension use cases beyond Wikimedia. But for the purposes of this RFC, we are restricting this to Wikimedia wikis only.

Exploration

The Parsing Team has started the process of consultation in different venues this year:

  • Early request for review from a small set of Wikimedia engineers: Discussion on mw:Talk/Parsoid/Extension_API
  • Early look at Parsoid Extension API presentation at EMWCon 2020 in April: https://www.mediawiki.org/wiki/EMWCon_Spring_2020/Program
  • An WMF-only internal email to tech-all and product-all in July 2020
  • Retargeting extensions for Parsoid: Tech Talk in August 2020; Video, Slides
  • File a TechCom RFC and get additional feedback. Seek explicit approval of core design of the Parsoid Extension API
  • Seek wider feedback via outreach on wikitech-l and mediawiki-l

Event Timeline

daniel subscribed.

Putting this on the TechCom board, because feedback from the committee has explicitly been requested.

It looks like the TechCom has some questions / concerns about what engagement we have done around these changes. The exploration section of the RFC covers some of that.

Since the RFC has been filed, I've emailed on wikitech-l ( https://lists.wikimedia.org/pipermail/wikitech-l/2020-September/093827.html ) and mediawiki-l ( https://lists.wikimedia.org/pipermail/mediawiki-l/2020-September/048473.html ).

In terms of responses and questions around this, here are some anecdotes about how it has been received:

  • At the EMWCon April 2020 talk, the questions I got were around documentation and whether there will be examples as guidance. Couple of folks were looking forward to Parsoid's extension favorably.
  • After the internal email within WMF, Roan K reviewed the proposal and flagged the absence of ParserOutput support (see Talk:Parsoid/Extension_API) and we've flagged that on the Parsoid Extension API page as something we'll support.
  • After the August 12, 2020 tech talk,
    • I got positive feedback ("This was great and the Parsoid Extension API looks very exciting", "Thank you! THe API looks great!").
    • User:tpt dropped by on #mediawiki-parsoid and discussed some followup questions from the Tech Talk and from what I could tell, was satisfied that they could support their wikisource extension with Parsoid.
    • Lukas Werkmeister left a question on the talk page about Parsoid support for setFunctionHook and I acknowledged that we will be providing support for it.
  • After my wikitech-l email, couple of devs had questions on the talk page as well which I believe have been addressed.
  • Within WMF, here is a status update
    • I filed T261181 for the language team
    • M.Volz filed T262266 for templatedata
    • Parsing team will be updating ImageMap and indicator extensions.
    • So, that does still leave a number of other extensions -- Parsing team will figure out a plan for getting all those extensions updated.

So, that is all the update I have for you wrt engagement around this change. I believe more substantial comments and engagement will come as developers start updating their extensions and start grappling with the details.

As far as this RFC and TechCom is concerned, as I mention in the "Extension using this API" section, I don't think it makes sense for TechCom to spend its time with a fine-toothed comb over the details. Instead, it might be helpful to look at the high-level design and approve that. As we indicated in all the forums, the details will evolve (the API will likely expand and we'll likely discover hooks we might have to support) and we are committed to not breaking extensions gratuitously.

Hope this is helpful as you evaluate the proposal.

Relatedly, as part of T236809: Refactor Parser.php to allow alternate parser (Parsoid) and its subtasks, we are narrowing the public API of Parser.php (and associated classes) and deprecating and removing hooks that don't need to be supported.

In addition to the above outreach, if TechCom thinks a public IRC discussion / meeting would be beneficial, I am happy to do it as well.

Relatedly: See T261181#6476451 where we are discussing potentially a 3rd type of Parsoid extension (annotation extensions). The other two being tag extensions and content-model-handler extensions.

TechCom is proposing to approve this, the Last Call will end on Oct 7.

Milimetric subscribed.

Last Call ended yesterday, this is approved

(apologies for the late update)