Parsoid uses a fork of PEG.js that @tstarling worked on. This fork adds some features to PEG.js to remove some JS / async related hacks to PEG.js and improve the tokenizer-generation performance of PEG.js.
To port Parsoid to PHP, we need a replacement for this PEG tokenizer.
Here are some options available to us.
- There is phppegjs which is a plugin for PEG.js that generates a PHP tokenizer instead of a JS tokenizer. It also enables co-location of PHP and JS action code in the PEG tokenizer. But, this requires us to do one of the following:
- Abandon Tim's fork and adapt Parsoid-PHP to use this tokenizer. This is not a workable solution out of the box.
- Upstream some of Tim's changes to PEG.js, and then use the php-peg plugin. This requires us to separate out the necessary features and upstream them and for the maintainer to be interested in these changes.
- Implement the php-peg plugin on top of Tim's fork.
- Evaluate PHP-PEG and see if our PEG grammar works with that
- If performance of the tokenizer is a potential concern, evaluate C-PEG and see if our PEG grammar works with that.
This task is to evaluate our options and propose a suitable solution that meets our functional and performance requirements.