santhosh (Santhosh Thottingal)
Principal Software Engineer, Language Engineering.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 7 2014, 2:57 AM (508 w, 5 d)
Availability
Available
LDAP User
Santhosh
MediaWiki User
Sthottingal-WMF [ Global Accounts ]

Recent Activity

Tue, Jul 2

santhosh added a comment to T367873: Technical exploration to support topic-based suggestions with the current Recommendation API.

The source code at https://github.com/wikimedia/research-recommendation-api has lot of legacy code, broken or unmaintained dependencies. The web frontend is with bower, jquery and such very old tooling. Recent updates by machine learning team got it somewhat functional to the extend it is integrated to liftwing. But adding new features require more fixups to get a smooth local development experience. We can ignore the web frontend part (AKA - gapfinder) for now as we are interested only in the API.

Tue, Jul 2, 11:47 AM · LPL Hypothesis, ContentTranslation
santhosh added a comment to T367873: Technical exploration to support topic-based suggestions with the current Recommendation API.

My preference is to enhance the "new" recommendation API at https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_content_translation_recommendation so that it can accept a topic(example: Chemisty, History, Africa, Music etc) and give recommendations. It should accept more than one topic. We can also see an intersection of topic and article in later stage.

Tue, Jul 2, 7:03 AM · LPL Hypothesis, ContentTranslation
santhosh renamed T366339: MinT for Wikipedia Readers MVP: Support customizing the "Review the automatic translation" to use the appropriate tool on each wiki from MinT MVP: Support customizing the "Review the automatic translation" to use the appropriate tool on each wiki to MinT for Wikipedia Readers MVP: Support customizing the "Review the automatic translation" to use the appropriate tool on each wiki.
Tue, Jul 2, 5:07 AM · LPL Essential, MinT
santhosh renamed T366213: MinT for Wikipedia Readers MVP search not finding an existing language version for the "Tomate" article from MinT MVP search not finding an existing language version for the "Tomate" article to MinT for Wikipedia Readers MVP search not finding an existing language version for the "Tomate" article.
Tue, Jul 2, 5:06 AM · Language-Team (Language-2024-April-June), MinT
santhosh renamed T366210: Provide access to more languages in the MinT for Wikipedia Readers MVP Search step from Provide access to more languages in the MinT MVP Search step to Provide access to more languages in the MinT for Wikipedia Readers MVP Search step.
Tue, Jul 2, 5:06 AM · LPL Essential, MinT

Mon, Jul 1

santhosh claimed T364525: Ignore extra spaces form source text in the MinT test instance.
Mon, Jul 1, 10:37 AM · LPL Essential (LPL Essential 2024 Jul-Sep), Patch-For-Review, MinT
santhosh added a comment to T345102: Select the default tab between "Contribute" and "View contributions" to minimize tab switching for users frequently using one of them.

This ticket proposes to adjust the tab that the user navigates to by default to by considering the previous selections, and the existence of previous contributions by the user:

Mon, Jul 1, 9:36 AM · Patch-For-Review, MediaWiki-Core-Skin-Architecture
santhosh renamed T359829: MinT for Wiki Readers MVP: Translation options from MinT MVP: Translation options to MinT for Wiki Readers MVP: Translation options.
Mon, Jul 1, 5:03 AM · Patch-For-Review, LPL Essential (LPL Essential 2024 Jul-Sep), MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), MinT

Thu, Jun 27

santhosh renamed T359863: MinT for Wiki Readers MVP: Explore languages from MinT MVP: Explore languages to MinT for Wiki Readers MVP: Explore languages.
Thu, Jun 27, 8:09 AM · MW-1.43-notes (1.43.0-wmf.13; 2024-07-09), Patch-For-Review, LPL Essential (LPL Essential 2024 Jul-Sep), MinT
santhosh added a comment to T338432: Prepare the cxserver for usage without RESTbase.

Internally - in CX production and in our developer workflows we directly use cxserver APIs and not the RESTBase apis like https://en.wikipedia.org/api/rest_v1/#/Transforms/doMT.

Thu, Jun 27, 4:53 AM · RESTBase Sunsetting, CX-cxserver
santhosh added a comment to T368437: Mint translating wrong letter in punjabi.

dda + nukta forming the same ligature rendering of rra is a common issue in Gurmukhi fonts. For example Ektype's Mukta has this issue. And this practice of having same shape for nukta form and RRA is not adviced, yet many fonts has them. This is the reason why you see two different shapes as reported above. Common users not aware of this encoding difference, but focusing only in rendering, uses them interchangeably. This is wrong usage appears in corpus. For example, in many dravidian scripts I have seen people using 0(zero) in the place of ഠ, :(colon) instead of ഃ(visarga) and so on. A neuaral MT system learns them and the same issues appear in MT output. I have seen this issue in many other languages too.

Thu, Jun 27, 4:14 AM · MinT, ContentTranslation

Wed, Jun 26

santhosh added a comment to T335491: Provide better long-term storage for translation models.

@elukey Thanks for these details. Currently in our code, models are downloaded using a boostrap shell script(called via docker entrypoint mechanism) using simple wget. These models are then mounted to the docker volume. So our server code just assumes the models are present in a configurable file system location. Do you see any issue if we follow this approach? Does the caveats you mentioned complicate this approach?

Wed, Jun 26, 10:26 AM · LPL Essential, SRE-swift-storage, MinT, CX-deployments

Thu, Jun 20

santhosh renamed T363183: MinT for Wiki Readers MVP: Access from the mobile language selector from MinT MVP: Access from the mobile language selector to MinT for Wiki Readers MVP: Access from the mobile language selector.
Thu, Jun 20, 6:45 AM · MW-1.43-notes (1.43.0-wmf.10; 2024-06-18), Language-Team (Language-2024-April-June), MinT
santhosh renamed T363338: MinT for Wiki Readers MVP: Access from the footer of an article from MinT MVP: Access from the footer of an article to MinT for Wiki Readers MVP: Access from the footer of an article.
Thu, Jun 20, 6:45 AM · MW-1.43-notes (1.43.0-wmf.13; 2024-07-09), LPL Essential (LPL Essential 2024 Jul-Sep), MinT

Thu, Jun 13

santhosh added a comment to T352692: Explore possible approaches to support wikitext in MinT.

Round trip technique like wikitext->html->wikitext is one way to achieve this. However it has limitations. For example, if wikitext has a template and one of the template parameter is nested wikitext, we will miss it in html rendering(For example i18n sentences with plural syntax etc). So translation will be incomplete.

Thu, Jun 13, 1:15 PM · MinT

Wed, Jun 12

santhosh claimed T363563: Avoid references losing their data (showing as plain-text "[1]") when added to the translation using MinT.
Wed, Jun 12, 3:41 AM · Wikimedia-Medicine, ContentTranslation, Language-Team (Language-2024-April-June), MinT
santhosh moved T363563: Avoid references losing their data (showing as plain-text "[1]") when added to the translation using MinT from Priority: Translation to In Review on the Language-Team (Language-2024-April-June) board.
Wed, Jun 12, 3:41 AM · Wikimedia-Medicine, ContentTranslation, Language-Team (Language-2024-April-June), MinT

Tue, Jun 11

santhosh added a comment to T363563: Avoid references losing their data (showing as plain-text "[1]") when added to the translation using MinT.

I was able to reproduce and find out the pattern that cause this issue. Repeated references. Only the first one gets fixed in MT. Second one onwards, it appears plain text. A few months back I had addressed this by keeping a search start in look up logic, but it is not catching repeatations outside the sentence. I am exploring potential solutions.

Tue, Jun 11, 7:15 AM · Wikimedia-Medicine, ContentTranslation, Language-Team (Language-2024-April-June), MinT

Mar 28 2024

santhosh moved T349487: Improve MinT punctuation support for Japanese from In Review to Needs QA on the Language-Team (Language-2024-January-March) board.
Mar 28 2024, 5:26 AM · Language-Team (Language-2024-April-June), MinT
santhosh moved T355304: Enable Softcatalà models for more language pairs in MinT test instance from In Review to Needs QA on the Language-Team (Language-2024-January-March) board.
Mar 28 2024, 5:26 AM · LPL Essential (LPL Essential 2024 Jul-Sep), MinT
santhosh moved T347930: Odia Language Translation Number not translating from In Review to Needs QA on the Language-Team (Language-2024-January-March) board.
Mar 28 2024, 5:26 AM · Language-Team (Language-2024-April-June), MinT

Mar 26 2024

santhosh moved T347930: Odia Language Translation Number not translating from Quarter Backlog to In Review on the Language-Team (Language-2024-January-March) board.
Mar 26 2024, 9:37 AM · Language-Team (Language-2024-April-June), MinT
santhosh claimed T347930: Odia Language Translation Number not translating.
Mar 26 2024, 9:36 AM · Language-Team (Language-2024-April-June), MinT
santhosh moved T355304: Enable Softcatalà models for more language pairs in MinT test instance from Quarter Backlog to In Review on the Language-Team (Language-2024-January-March) board.
Mar 26 2024, 9:29 AM · LPL Essential (LPL Essential 2024 Jul-Sep), MinT
santhosh claimed T355304: Enable Softcatalà models for more language pairs in MinT test instance.
Mar 26 2024, 9:29 AM · LPL Essential (LPL Essential 2024 Jul-Sep), MinT
santhosh moved T349487: Improve MinT punctuation support for Japanese from Priority: Translation to In Review on the Language-Team (Language-2024-January-March) board.
Mar 26 2024, 5:25 AM · Language-Team (Language-2024-April-June), MinT
santhosh claimed T349487: Improve MinT punctuation support for Japanese.
Mar 26 2024, 5:25 AM · Language-Team (Language-2024-April-June), MinT

Mar 21 2024

santhosh added a project to T358637: Duplicated elements in Universal Language Selector: Language-Team.
Mar 21 2024, 8:26 AM · Language-Team (Language-2024-April-June), WMDE-TechWish-Sprint-2024-04-24, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Localization Infrastructure FY2023-24, Unplanned-Sprint-Work, UniversalLanguageSelector
santhosh added a comment to T358637: Duplicated elements in Universal Language Selector.

The CX entrypoint is also duplicated if you click multiple times while language selector is loading:

Mar 21 2024, 5:29 AM · Language-Team (Language-2024-April-June), WMDE-TechWish-Sprint-2024-04-24, MW-1.43-notes (1.43.0-wmf.3; 2024-04-30), Localization Infrastructure FY2023-24, Unplanned-Sprint-Work, UniversalLanguageSelector

Mar 19 2024

santhosh added projects to T352739: cxserver: Cannot read properties of undefined (reading 'pages'): Language-Team (Language-2024-January-March), Unplanned-Sprint-Work.
Mar 19 2024, 5:17 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver
santhosh claimed T352739: cxserver: Cannot read properties of undefined (reading 'pages').
Mar 19 2024, 5:16 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver

Mar 18 2024

santhosh added a comment to T352739: cxserver: Cannot read properties of undefined (reading 'pages').

After the migration to node fetch, the error is still there:

	TypeError: Cannot read properties of undefined (reading 'pages')
    at processResult (/srv/service/lib/mw/BatchedAPIRequest.js:85:23)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Mar 18 2024, 11:16 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver

Mar 14 2024

santhosh closed T359516: cxserver not able to load any page as Resolved.

The issue is resolved and the root cause of bad requests from preq library is also resolved

Mar 14 2024, 9:43 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver

Mar 13 2024

santhosh added a comment to T356532: Consider word-breaks as a way to improve readability in languages with long words.

Browsers natively support hyphenation(breaking the word at proper position) these days. No need to change the content for this. Following CSS example shows how to do this. I developed hyphenation system for Indian languages and that is what Chrome, Firefox, TeX, Libreoffice, Indesign etc using these days.

Mar 13 2024, 5:27 AM · Web-Team-Backlog

Mar 8 2024

santhosh claimed T359525: MinT: Translation with MinT/Apertium are failing: fetch failed.
Mar 8 2024, 9:45 AM · Language-Team (Language-2024-January-March), MinT

Mar 7 2024

santhosh added a comment to T359525: MinT: Translation with MinT/Apertium are failing: fetch failed.

Both MinT and Apertium does not use proxy. They were working and then we added MT clients with proxy support . Then clients without proxy started showing this issue- This is not consistently reproducible, but happens very frequent.

Mar 7 2024, 1:56 PM · Language-Team (Language-2024-January-March), MinT
santhosh lowered the priority of T359516: cxserver not able to load any page from Unbreak Now! to High.

Issue is resolved now as train is rolled back. Not closing as we need to monitor this when train is running with backported patch

Mar 7 2024, 1:22 PM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver
santhosh added a comment to T359516: cxserver not able to load any page.

It seems the backend issue is T359509: REST API calls suddenly all returning 400 and there is already a patch to be reviewed and merged:

Mar 7 2024, 11:10 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver
santhosh triaged T359516: cxserver not able to load any page as Unbreak Now! priority.
Mar 7 2024, 10:31 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver
santhosh created T359516: cxserver not able to load any page.
Mar 7 2024, 10:31 AM · Unplanned-Sprint-Work, Language-Team (Language-2024-January-March), CX-cxserver

Mar 5 2024

santhosh updated subscribers of T345340: Setup Wiki Family on CX / SX staging.

mw-cli can help us to create many language wikis in a cloud instance.
So we can have http://en.mediawiki.mwdd.localhost:8080, http://ig.mediawiki.mwdd.localhost:8080 ..

Mar 5 2024, 10:03 AM · LPL Essential, ContentTranslation
santhosh changed the visibility for F42169967: image.png.
Mar 5 2024, 8:52 AM

Mar 4 2024

santhosh added a comment to T358836: Develop format for metrics for the language and internationalization newsletter.

Tangential note: https://ruralindiaonline.org/en/articles/in-2023-paribhasha-builds-a-peoples-archive-in-peoples-languages/ is a bad example because of broken rendering in the scripts used in title image - We should never do that.

Mar 4 2024, 4:34 AM · LPL Analytics, LPL Technical Support, Product-Analytics (Kanban), Language-analytics

Feb 28 2024

santhosh updated subscribers of T325790: Special:ContentTranslationStats is slow and getting crowded.

There is a feature in superset where we can just embed any dashboards in any web page. That seems the easiest approach here. https://github.com/apache/superset/tree/master/superset-embedded-sdk

Feb 28 2024, 4:12 AM · LPL Analytics, LPL Essential, MW-1.43-notes (1.43.0-wmf.12; 2024-07-02), ContentTranslation, Language-analytics, Data-Engineering-Icebox, Analytics, Technical-Debt

Feb 27 2024

santhosh added a comment to T340956: Proof-of-concept for showing a machine translated sections of Wikipedia articles.

A screenshot illustrating reference misplacement with current prototype: From https://en.wikipedia.org/wiki/Polar_bear

Feb 27 2024, 6:32 AM · Language-Team (Language-2024-April-June), MinT

Feb 21 2024

santhosh added a comment to T357950: Remove servicerunner dependency for cxserver.

The above patch is a quick run to identify the required efforts to migrate from servicerunner. It is not for merge. My proposal is to modernize various parts of cxserver, while using servicerunner as process manager. Do this migrations in iterations and at later stage when cxserver does not have a strong dependency on servicerunner other than a process manager, replace it. Doing everything in one go is too risky as cxserver is the backbone of our heavily used translation system.

Feb 21 2024, 7:02 AM · Patch-For-Review, CX-cxserver, Technical-Debt

Feb 20 2024

santhosh created T357950: Remove servicerunner dependency for cxserver.
Feb 20 2024, 5:12 AM · Patch-For-Review, CX-cxserver, Technical-Debt

Feb 19 2024

santhosh added a comment to T338608: Support requesting translations from a specific model in MinT.

The list of models for a language pair is provided in API output of https://translate.wmcloud.org/api/languages
This is linked in the UI - See bottom links - API Spec

Feb 19 2024, 8:47 AM · Language-Team (Language-2024-January-March), MinT

Jan 22 2024

santhosh moved T338608: Support requesting translations from a specific model in MinT from In Progress to Needs QA on the Language-Team (Language-2024-January-March) board.
Jan 22 2024, 10:33 AM · Language-Team (Language-2024-January-March), MinT
santhosh added a comment to T347929: In Odia, translation always outputs ଯ଼ instead of ୟ.

Additional information: This issue happens with indictrans2-en-indic model. NLLB-200 gives correct output

Jan 22 2024, 10:29 AM · LPL Essential, MinT
santhosh moved T355303: Adjust multiple model support on MinT test instance from Quarter Backlog to In Review on the Language-Team (Language-2024-January-March) board.
Jan 22 2024, 4:39 AM · Language-Team (Language-2024-January-March), MinT
santhosh claimed T355303: Adjust multiple model support on MinT test instance.
Jan 22 2024, 4:39 AM · Language-Team (Language-2024-January-March), MinT

Jan 11 2024

santhosh changed the visibility for F41665596: ast.png.
Jan 11 2024, 1:00 PM

Dec 19 2023

santhosh added a comment to T351740: Deploy ctranslate2 version of nllb-200.

this will allow the language team to use this model server

Dec 19 2023, 6:34 AM · Machine-Learning-Team

Dec 12 2023

santhosh added a comment to T352690: Evaluate the integration of the new IndicTrans model (IndicTrans2-M2M) into MinT.

However, when inspecting the target language selector you can notice that Santali (sat) is not listed.

Dec 12 2023, 5:23 AM · Language-Team (Language-2024-January-March), MinT
santhosh added a comment to T353185: Rebuild (or upgrade the kernel on) mint.language.eqiad1.wikimedia.cloud .
$ uname -r
6.1.0-15-cloud-amd64
Dec 12 2023, 4:31 AM · Language-Team (Language-2023-October-December), cloud-services-team, Cloud-VPS

Dec 7 2023

santhosh claimed T338608: Support requesting translations from a specific model in MinT.
Dec 7 2023, 10:29 AM · Language-Team (Language-2024-January-March), MinT

Dec 5 2023

santhosh claimed T352690: Evaluate the integration of the new IndicTrans model (IndicTrans2-M2M) into MinT.
Dec 5 2023, 10:47 AM · Language-Team (Language-2024-January-March), MinT
santhosh merged T352741: Support Indic-Indic translation using IndicTrans2 indic-indic model into T352690: Evaluate the integration of the new IndicTrans model (IndicTrans2-M2M) into MinT.
Dec 5 2023, 10:47 AM · Language-Team (Language-2024-January-March), MinT
santhosh merged task T352741: Support Indic-Indic translation using IndicTrans2 indic-indic model into T352690: Evaluate the integration of the new IndicTrans model (IndicTrans2-M2M) into MinT.
Dec 5 2023, 10:46 AM · Language-Team (Language-2023-October-December), MinT
santhosh created T352741: Support Indic-Indic translation using IndicTrans2 indic-indic model.
Dec 5 2023, 8:55 AM · Language-Team (Language-2023-October-December), MinT
santhosh updated the task description for T352733: Provide python3-build-bookworm docker image.
Dec 5 2023, 5:23 AM · serviceops, Language-Team (Language-2023-October-December), MinT

Dec 4 2023

santhosh claimed T352620: Failure to start new translations (item.dispose is not a function).
Dec 4 2023, 7:36 AM · Regression, CX-cxserver, Language-Team (Language-2023-October-December)
santhosh triaged T352620: Failure to start new translations (item.dispose is not a function) as High priority.
Dec 4 2023, 5:36 AM · Regression, CX-cxserver, Language-Team (Language-2023-October-December)
santhosh added a comment to T352620: Failure to start new translations (item.dispose is not a function).

The actual failure can be reproduced by visiting https://cxserver.wikimedia.org/v2/page/sv/nn/Royal_Society_for_the_Protection_of_Birds

Page sv:Royal_Society_for_the_Protection_of_Birds could not be found. TypeError: item.dispose is not a function
Dec 4 2023, 5:36 AM · Regression, CX-cxserver, Language-Team (Language-2023-October-December)
santhosh added a comment to T352620: Failure to start new translations (item.dispose is not a function).

Root cause is a regresssion from recent cxserver upgrade. Fix already in place https://gerrit.wikimedia.org/r/c/mediawiki/services/cxserver/+/978192 waiting for deployment

Dec 4 2023, 5:33 AM · Regression, CX-cxserver, Language-Team (Language-2023-October-December)

Nov 30 2023

santhosh added a comment to T347272: Simplify the system of limits to make it more predictable.

From our past observations, especiailly during translaiton campaigns, many users participate, potentially creating low quality articles. The review happens much later. Reviwers also had complained that they cannot review all these articles on time. When review happens, articles get deleted. So the deletion happens weeks later the translation activity. Considering this, the chances that a new user has a deleted translation while making intentional or unintentaionl low quality translation is rare.
Hence, the proposed strict limit if user has deletion in last 30 days might not have expected effect. However, I support keeping this in place. But the user should be clearly communicated why their translation limits are high.

Nov 30 2023, 10:30 AM · ContentTranslation
santhosh added a comment to T251893: Reevaluate algorithm that measures the percentage of unmodified contents for languages without spaces.

The current logic in CX for CJK group of languages(including chinese) is follows. The tokens are characters instead of words, so 人口 has 2 tokens.

Nov 30 2023, 9:32 AM · Language-Team (Language-2023-October-December), ContentTranslation
santhosh added a comment to T335491: Provide better long-term storage for translation models.

@elukey, What do you mean by 'reaching out to you by next time' ? Regarding the architecture of MinT and why it is not using LiftWing we had discussion in the past. I don't think it is not useful to repeat. There is a reason why we put the models in people.wikimedia.org - it was as per recommendation from SRE and this ticket was created to make it more reliable. We still need a public location for models download as MinT is not designed for WMF instrastructure alone.

Nov 30 2023, 6:41 AM · LPL Essential, SRE-swift-storage, MinT, CX-deployments

Nov 28 2023

santhosh added a comment to T352136: Increase quota to create large instance for MinT.

We need 2TB scratch volume mounted too.

Nov 28 2023, 10:21 AM · Cloud-VPS (Quota-requests), Language-Team (Language-2023-October-December), MinT

Nov 21 2023

santhosh placed T351690: [MinT] Clearing default MinT text clears source and target langs and also using backspace up for grabs.
Nov 21 2023, 5:17 AM · LPL Essential, MinT

Nov 16 2023

santhosh updated the task description for T351138: Some articles with gallery fail to start for translation .
Nov 16 2023, 10:22 AM · CX-cxserver, Language-Team (Language-2023-October-December), Patch-For-Review, Wikimedia-production-error
santhosh renamed T351138: Some articles with gallery fail to start for translation from Cx-init-critical-error in Serbian Wikipedia to Gallery adaptation fails with updated MW Dom Spec.
Nov 16 2023, 10:21 AM · CX-cxserver, Language-Team (Language-2023-October-December), Patch-For-Review, Wikimedia-production-error
santhosh awarded Blog Post: The golden rule of web performance revisited (Wikipedia edition) a Like token.
Nov 16 2023, 4:23 AM

Nov 8 2023

santhosh changed the status of T350773: Remove preq and use node fetch from Open to In Progress.
Nov 8 2023, 10:58 AM · Language-Team (Language-2024-January-March), Unplanned-Sprint-Work, Technical-Debt, CX-cxserver
santhosh triaged T350773: Remove preq and use node fetch as Medium priority.
Nov 8 2023, 10:58 AM · Language-Team (Language-2024-January-March), Unplanned-Sprint-Work, Technical-Debt, CX-cxserver
santhosh claimed T350773: Remove preq and use node fetch.
Nov 8 2023, 10:51 AM · Language-Team (Language-2024-January-March), Unplanned-Sprint-Work, Technical-Debt, CX-cxserver
santhosh added projects to T350773: Remove preq and use node fetch: Technical-Debt, Language-Team (Language-2023-October-December).
Nov 8 2023, 10:50 AM · Language-Team (Language-2024-January-March), Unplanned-Sprint-Work, Technical-Debt, CX-cxserver
santhosh created T350773: Remove preq and use node fetch.
Nov 8 2023, 10:49 AM · Language-Team (Language-2024-January-March), Unplanned-Sprint-Work, Technical-Debt, CX-cxserver

Nov 7 2023

santhosh added a comment to T344982: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase.

https://test.wikipedia.org/w/rest.php/coredev/v0/transform/wikitext/to/html/Oxygen looks good. If this can be exposed for all production wikis, we can definitely move to this endpoint.

Nov 7 2023, 4:55 AM · Language-Team (Language-2024-January-March), CX-cxserver, serviceops, RESTBase Sunsetting

Nov 6 2023

santhosh added a comment to T344982: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase.

It seems we need to continue with restbase for the time being till a stable, well documented API is known as replacement, right?

Nov 6 2023, 4:29 AM · Language-Team (Language-2024-January-March), CX-cxserver, serviceops, RESTBase Sunsetting

Nov 2 2023

santhosh added a comment to T344982: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase.

http://parsoid-external-ci-access.beta.wmflabs.org - Does this use actual production wiki? Or beta.wmflabs.org? If it is beta.wmflabs.org, then we will be limited by content and supported languages right?

Nov 2 2023, 1:12 PM · Language-Team (Language-2024-January-March), CX-cxserver, serviceops, RESTBase Sunsetting
santhosh added a comment to T344982: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase.

If you need access to pagebundles or the transform endpoints, then we have to figure something out.

Nov 2 2023, 9:57 AM · Language-Team (Language-2024-January-March), CX-cxserver, serviceops, RESTBase Sunsetting
santhosh added a comment to T344982: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase.

I think we have a serious problem here.
At https://phabricator.wikimedia.org/T350219#9298055, @daniel wrote:

"Parsoid endpoints are not expected to work for external requests. So this is "working" as expected."

Nov 2 2023, 3:59 AM · Language-Team (Language-2024-January-March), CX-cxserver, serviceops, RESTBase Sunsetting

Nov 1 2023

santhosh added a comment to T344982: Make cxserver call parsoid endpoints on MediaWiki, instead of going through RESTbase.

The restbase endpoint is no longer working. What changed? @daniel, @MSantos

Nov 1 2023, 4:49 AM · Language-Team (Language-2024-January-March), CX-cxserver, serviceops, RESTBase Sunsetting

Oct 30 2023

santhosh added a project to T349991: MinT: Exception on /api/translate/nn/ff [POST]: Language-Team (Language-2023-October-December).
Oct 30 2023, 4:10 PM · Language-Team (Language-2023-October-December), MinT
santhosh claimed T349991: MinT: Exception on /api/translate/nn/ff [POST].
Oct 30 2023, 4:10 PM · Language-Team (Language-2023-October-December), MinT
santhosh added a comment to T349991: MinT: Exception on /api/translate/nn/ff [POST].

Fixed in sentencex version 0.5.1

Oct 30 2023, 3:56 PM · Language-Team (Language-2023-October-December), MinT
santhosh added a comment to T348794: TypeScript declaration files for jquery.i18n.

@Sportzpikachu Thanks for the PR. Please note that jquery.i18n has a successor banana.i18n which is a framework agnostic js library. That is the library we are actively going to maintain. If your usecase can use that library, it would be much better.

Oct 30 2023, 2:44 PM · MediaWiki-Internationalization, Language and Product Localization, I18n
santhosh renamed T349893: Not able to restore saved translations from ContentTranslation to Not able to restore saved translations.
Oct 30 2023, 6:32 AM · Language-Team (Language-2023-October-December), ContentTranslation

Oct 25 2023

santhosh added a comment to T349618: Automatic language detection misidentifies language in some cases.

The model expects sentences. That is how it is trained. For example, words like "Moon" can appear in many latin based languages as proper noun or reference to a title of a book etc. The prediction quality increase as more words are provided. Then it knows better about the context of the word.

Oct 25 2023, 4:54 AM · MinT

Oct 19 2023

santhosh changed the header image for post Blog Post: sentencex: Empowering NLP with Multilingual Sentence Extraction.
Oct 19 2023, 6:36 AM

Oct 17 2023

santhosh added a comment to T340507: Create a language detection service in LiftWing.

Thank you @isarantopoulos and @elukey !

Oct 17 2023, 9:26 AM · Lift-Wing, Machine-Learning-Team, Patch-For-Review, I18n, OKR-Work
santhosh moved T99666: Provide a service to detect which language the user is writing on from Quarter Backlog to Done on the Language-Team (Language-2023-October-December) board.

We have the service in production: https://api.wikimedia.org/wiki/Lift_Wing_API/Reference/Get_language_identification_prediction

Oct 17 2023, 9:18 AM · Language-Team (Language-2023-October-December), Patch-For-Review, WMF-General-or-Unknown, I18n, OKR-Work
santhosh added a project to T99666: Provide a service to detect which language the user is writing on: Language-Team (Language-2023-October-December).
Oct 17 2023, 9:15 AM · Language-Team (Language-2023-October-December), Patch-For-Review, WMF-General-or-Unknown, I18n, OKR-Work

Oct 13 2023

andrea.denisse awarded Blog Post: sentencex: Empowering NLP with Multilingual Sentence Extraction a Love token.
Oct 13 2023, 11:30 PM
ppelberg awarded Blog Post: sentencex: Empowering NLP with Multilingual Sentence Extraction a Barnstar token.
Oct 13 2023, 10:48 PM

Oct 12 2023

santhosh added a comment to T340507: Create a language detection service in LiftWing.

@elukey If I understood that documentation correctly, if the service required oauth token, still Anonymous users can use it with the applicable ratelimiting. am I right?
There would be usecases where non-mediawiki static webpage using this API and this anonymous ratelimited option should be sufficient.

Oct 12 2023, 12:53 PM · Lift-Wing, Machine-Learning-Team, Patch-For-Review, I18n, OKR-Work
santhosh added a comment to T348612: References moved to the end of the sentence and links disappear when translated with MinT.

Yes, references are moved to the end of sentence. Also seen in this example below. The positioning of references after the correct position in translation is slightly complicated and need to be implemented.

Oct 12 2023, 4:51 AM · LPL Technical Support (LPL Technical Support (Current)), Regression, MinT
santhosh added a comment to T340507: Create a language detection service in LiftWing.

@santhosh Thanks for creating the model card!
Is there a client/system that will use this at the moment? If yes, is there an estimate on the amount of traffic we should be expecting? Main reason I am asking is so that we know the scaling requirements (if any) and also can validate via load testing.

Oct 12 2023, 4:20 AM · Lift-Wing, Machine-Learning-Team, Patch-For-Review, I18n, OKR-Work