PHP Warning: preg_match(): Compilation failed: invalid UTF-8 string at offset 0
Closed, ResolvedPublic3 Estimated Story PointsPRODUCTION ERROR

Description

Error

MediaWiki version: 1.36.0-wmf.10

message
PHP Warning: preg_match(): Compilation failed: invalid UTF-8 string at offset 0

Details

Request ID
a0e702b3-3c00-43c5-b6b1-1c2f9fa3b90e
Request URL
https://ckb.wikipedia.org/w/index.php?title=%D9%84%DB%8E%D8%AF%D9%88%D8%A7%D9%86%DB%8C_%D8%A8%DB%95%DA%A9%D8%A7%D8%B1%DA%BE%DB%8E%D9%86%DB%95%D8%B1:Haryad_Xasraw&action=submit
Stack Trace
exception.trace
#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.36.0-wmf.10/extensions/Echo/includes/DiscussionParser.php(1171): preg_match(string, string)
#2 /srv/mediawiki/php-1.36.0-wmf.10/extensions/Echo/includes/DiscussionParser.php(892): EchoDiscussionParser::getTimestampRegex()
#3 /srv/mediawiki/php-1.36.0-wmf.10/extensions/Echo/includes/DiscussionParser.php(837): EchoDiscussionParser::getTimestampPosition(string)
#4 /srv/mediawiki/php-1.36.0-wmf.10/extensions/Echo/includes/DiscussionParser.php(218): EchoDiscussionParser::stripSignature(string, Title)
#5 /srv/mediawiki/php-1.36.0-wmf.10/extensions/Echo/includes/DiscussionParser.php(63): EchoDiscussionParser::generateMentionEvents(string, array, string, MediaWiki\Revision\RevisionStoreRecord, User)
#6 /srv/mediawiki/php-1.36.0-wmf.10/extensions/Echo/includes/EchoHooks.php(540): EchoDiscussionParser::generateEventsForRevision(MediaWiki\Revision\RevisionStoreRecord, boolean)
#7 /srv/mediawiki/php-1.36.0-wmf.10/includes/deferred/MWCallableUpdate.php(38): EchoHooks::{closure}()
#8 /srv/mediawiki/php-1.36.0-wmf.10/includes/deferred/DeferredUpdates.php(467): MWCallableUpdate->doUpdate()
#9 /srv/mediawiki/php-1.36.0-wmf.10/includes/deferred/DeferredUpdates.php(344): DeferredUpdates::attemptUpdate(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti)
#10 /srv/mediawiki/php-1.36.0-wmf.10/includes/deferred/DeferredUpdates.php(278): DeferredUpdates::run(MWCallableUpdate, Wikimedia\Rdbms\LBFactoryMulti, Monolog\Logger, BufferingStatsdDataFactory, string)
#11 /srv/mediawiki/php-1.36.0-wmf.10/includes/deferred/DeferredUpdates.php(194): DeferredUpdates::handleUpdateQueue(array, string, integer)
#12 /srv/mediawiki/php-1.36.0-wmf.10/includes/MediaWiki.php(1113): DeferredUpdates::doUpdates(string)
#13 /srv/mediawiki/php-1.36.0-wmf.10/includes/MediaWiki.php(849): MediaWiki->restInPeace()
#14 /srv/mediawiki/php-1.36.0-wmf.10/includes/MediaWiki.php(861): MediaWiki->{closure}()
#15 /srv/mediawiki/php-1.36.0-wmf.10/includes/MediaWiki.php(582): MediaWiki->doPostOutputShutdown()
#16 /srv/mediawiki/php-1.36.0-wmf.10/index.php(53): MediaWiki->run()
#17 /srv/mediawiki/php-1.36.0-wmf.10/index.php(46): wfIndexMain()
#18 /srv/mediawiki/w/index.php(3): require(string)
#19 {main}

Event Timeline

Change 636737 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Echo@master] Fix DiscussionParser failing in certain languages

https://gerrit.wikimedia.org/r/636737

I wrote a little script that iterates all known languages, and found this failure happens when the user's interface language is set to one of these: bo, ckb, dz, km, ks, ks-arab, ku-arab, skr. The reason is a regular expression in the DiscussionParser that is not set to Unicode mode via /u. It doesn't look like this is necessary, because there are no non-ASCII characters in this pattern. But what happens is that the initial \h in this pattern matches an 8-bit part of a Unicode character that happens to be in the right position in one of these languages. In /u mode the \h sequence only matches full Unicode characters.

thiemowmde set the point value for this task to 3.

Change 636737 merged by jenkins-bot:
[mediawiki/extensions/Echo@master] Fix DiscussionParser failing in certain languages

https://gerrit.wikimedia.org/r/636737