About this board

Archives of very old discussions are available:

Jean-Paul (talkcontribs)

Dobrý den,

chtěl bych do wikidat dodat novou položku (externí identifikátor), ale tohle je nad moje síly. Vám by to nemělo dělat problém, a proto prosím o pomoc.


Šlo by o „identifikátor filmu v FDb“ a „identifikátor osoby v FDb“

Z wikidat se v rámci infoboxu film natahují položky „filmový přehled“, „čsfd“, „kinobox“ a „imdb“, pouze u „fdb“ nikoliv, protože v rámci wikidat jako externí identifikátor nefiguruje.

Stejně tak to platí i pro šablony {{Fdb osoba}} a {{Fdb film}}.


Mohu Vás touto cestou požádat o jejich přidání?

Děkuji

~~~~

Matěj Suchánek (talkcontribs)

Dobrý den, vlastnost (v tomto případě ne tedy „položku“) je před založením nutné nechat projít schvalovacím procesem: Wikidata:Property proposal.

Reply to "Prosba"
Ыфь77 (talkcontribs)

Online translation: Please re-hide the vandal's edit in Talk:Q5201818 and my discussion page, as well as block 109.81.89.189.

Matěj Suchánek (talkcontribs)

Hidden. I chose protection over block since the IP changed, though range blocking is possible if attacks persist. Please prefer reporting to WD:AN.

Ыфь77 (talkcontribs)

Online translation: Thank you. The main thing is that the result is achieved: no vandalism on 2 pages.

Reply to "Vandalism 2"

Improving a reference in one, not two edits

6
Epìdosis (talkcontribs)

Hi! In the past I have bothered you sometimes for bot fixes; 3 days ago I eventually started my first task in PWB and I created a bot account, EpidòseosBot; the program I wrote is User:EpidòseosBot/GND P21.py and, according to the first few test edits I made, it works; however, I have not succeeded in condensing the addition and the removal of the reference in just one edit, but as of now it is made in two edits, which is not optimal. Reading Wikidata:Requests for permissions/Bot/MatSuBot 8 I think that you would be able to improve my code so that the edits are condensed in one; could you help me when you have time? Feel free to suggest other improvements, I have never used python before the last 3 days. Thanks in advance!

Feel free to edit directly User:EpidòseosBot/GND P21.py; I will then copy the edits in my local file ;-)

Matěj Suchánek (talkcontribs)

It's never late to start! It looks very good for a 3-day-old newbie. I will have a look. By taking a quick look at the API, calling repo.save_claim could be the trick.

Epìdosis (talkcontribs)

Hi! Thanks again for your first fixes which I have immediately applied! Today, thanks to the very useful suggestions of Horcrux (see User talk:Horcrux#Un problema banale con PyWikiBot), I have found a way of saving references in one edit, effectively using "save_claim". If it seems good to you, I can proceed making my request for flag in the next days.

If you have any other suggestions to improve the code, of course apply them.

Epìdosis (talkcontribs)

I wait your OK about my code to request the flag; I've done about 20 edits and it worked with no issue. Have a nice weekend!

Matěj Suchánek (talkcontribs)

Horcrux has made a really good visualization regarding the structure of references!

Regarding your code, I think it's OK. Hope I didn't break anything.

Epìdosis (talkcontribs)

It works perfectly with your last changes, thanks for them! So I make the request for the flag.

KonstantinaG07 (talkcontribs)

Hello, this filter appears to had been malfunctioning, catching a large amounts of edits by a specific bot that appear to be irrelevant with it's purpose. I think it was due to a missing parenthesis and I temporarily added them, however please do review in case this behaviour was intended, or you want to further modify/clean it up.

Matěj Suchánek (talkcontribs)

Oops! Thanks for your intervention, it indeed seems to be my mistake. I have disabled and (marked as) deleted the filter, so that we can start clean.

Reply to "Re Abuse Filter 259"
Data Consolidation Officer (talkcontribs)

Hi Matěj Suchánek, you probably know this, but in most Wikipedias (including the German one), additions in brackets are used for mere disambiguation and shouldn’t be part of the label.

Your bot doesn’t appear to take this into account, though; here, for example, the label should just have been “Tare” (without the “(Würzsauce)” part). Would it be possible to change the bot so that it strips trailing additions in brackets from sitelinks when creating labels? --Data Consolidation Officer (talk) 10:35, 21 April 2024 (UTC)

Matěj Suchánek (talkcontribs)

Hi. Yes, I have been aware of this ever since I started these imports. Unfortunately, it's complicated, there are also many cases where the trailing brackets should be kept. Most recently: (German Wikipedia). More: .

Therefore, my strategy is to import the label with the disambiguator and then I apply some rules to determine if it's redundant.

In your case, I kept it because there was no German description. The idea is to motivate people to insert a description while also fixing the label manually. If there was a German description, I would remove it if either "Würzsauce" was found in the description or most other languages with labels starting with "Tare" didn't include any disambiguator.

Data Consolidation Officer (talkcontribs)

Yes, that seems like a reasonable heuristic. I don’t agree that the bracketed part should have been kept in District 1 (Düsseldorf) (Q551600) (Düsseldorf is already present in the description and that’s sufficient, imho), but if that’s the consistent handling of Düsseldorf’s districts, then it’s OK for now. I’d still say that such cases are outliers rather than being usual, at least for German Wikipedia. Yet, I can’t come up with a better solution for now, at least where descriptions are missing (automatically importing those is obviously not so easy).

Matěj Suchánek (talkcontribs)

Hi again. I made an experiment in which I had my robot just remove the disambiguation part from labels according to the rules (iterating its contributions from newest to older). It quickly started removing them from German labels where I believe the preference is to keep them (letting @Themenportale211 know): . So despite When a page title includes disambiguation, either through commas or parentheses, the disambiguation should not be included in the Wikidata label. Disambiguation information should instead be part of the description., I apparently cannot apply the rules for German without undesired edit warring.

Data Consolidation Officer (talkcontribs)

Those linked cases are (for lack of a better term) interesting:

  • In Law enforcement in Canada (Q2858778) and Carabineros de Chile (Q2317752), I don’t agree with the label, with or without the bracketed part. The German Wikipedia lemma is unsuitable as a Wikidata item label here; obviously the bot cannot know.
  • Similarly for Georgia at the 2024 Summer Olympics (Q42911757), Turkey at the 2024 Summer Olympics (Q116778664) and others of the same kind; here the lemma structure “Olympische Sommerspiele 2024/Teilnehmer (Georgien)” and “Olympische Sommerspiele 2024/Teilnehmer (Türkei)” seems chosen as if they were subpages of “Olympische Sommerspiele 2024” (although afaik Wikipedia does not make use of the subpage mechanism in Article namespace). Deleting the bracketed part here could easily be avoided by not removing it from labels containing a slash, but they should have a completely different label anyway.
  • I don’t really understand the choice of label in Byzantine Egypt (Q17302295); the German Wikipedia lemma (“Byzantinische Herrschaft in Ägypten”) seems more reasonable.

Maybe the bot could keep some kind of maintenance list of labels imported with potential disambiguation parts, for human double-checking? Or would that be too many? --Data Consolidation Officer (talk) 16:32, 25 May 2024 (UTC)

Reply to "Label additions by MatSuBot"
EncycloPetey (talkcontribs)

Your bot just made a HUGE number of edits like this one that go against agreed upon standards at WikiProject:Books. For all of the instance of (P31) usages, the value should instead be version, edition or translation (Q3331189).

For example Prometheus Bound (Q24063711) is a translation, but it's also an edition of that translation. Likewise, Śakoontalá; or, The Lost Ring (Q51107450) is both a translated text and is the fourth edition of that translation.

I'm not sure how many of these are the result of incorrect vlues being added in the past, but version, edition or translation (Q3331189) is now the agreed standard for instance of (P31). --EncycloPetey (talk) 05:43, 13 April 2024 (UTC)

Matěj Suchánek (talkcontribs)
EncycloPetey (talkcontribs)

The merge is correct for the two items, but its use was not. Part of the reason for the merge was its frequent misuse on multiple data items.

Reply to "translated text"
CV213 (talkcontribs)

Special:AbuseFilter/history/110 - you created it. In 2023-12 ISNI format changed to no spaces. Can you adjust the filter and re-enable?

Will it only give a warning when changing from no-space-format to space-format? If yes, can there also be made another filter that warns users not to insert ISNI with spaces?

Matěj Suchánek (talkcontribs)

I flipped that condition. I will have it run without warning for some time. (Remind me if I forget to reinstate that.)

I don't think it's necessary to have another filter for that. We have bots automatically fixing that, or we can also extend that filter.

CV213 (talkcontribs)

Thank you, I tried it, it works!

Can it catch more, any change away from correct regex? See .

Regarding warning when adding it in spaced format:

If people don't get a warning, they may go on forever, causing avoidable edits by bots. On top, they are less likely to notice the creation of duplicates, which can be detected if the ISNI is already on another item. This may result in extended work on an item and work of an uninvolved editor checking duplicates.

The primary source has the ISNI easily available without spaces, people that add ISNI manually should always check the primary source anyway.

There is also unnecessary clutter on DB CV reports (example) interfering with the work of editors checking the diffs and working on removing violations, not necessarily these violations, but any.

CV213 (talkcontribs)

The regex for ISNI is /[0-9]{15}[0-9X]/. In P213 the current regex is /[0]{7}[0-9]{8}[0-9X]/, which is probably OK for some time, as position 8 in new ISNI is currently "5" so requiring 7 zeros shouldn't be an issue for several months. But for the abuse filter which requires an admin to edit, it is likely better to be less restrictive - as long as no abuse is seen.

Matěj Suchánek (talkcontribs)

I will have it run without warning for some time. (Remind me if I forget to reinstate that.) Oops, I didn't realize the warning was still enabled, only the filter had been turned off. Thanks for testing, I guess the filter is safe.

In my opinion (and experience), the problems with setting filters for individual properties are

  • (As you said) You need an admin to modify the filter, especially in situations when the format changes.
  • They are getting more complex (due to the diff structure) when you want to cover 100% cases (modification with qualifier addition, etc.).
  • We cannot really cover every property.

But let me see.

CV213 (talkcontribs)

"[0-9]{15}[0-9X]" is defined in an ISO standard, and used in millions of links, I don't expect that format to change soon.

Finally I found mw:Extension:AbuseFilter/Rules_format which contains regex/rlike. Maybe something like:

& string(removed_lines) regex "[0-9]{15}[0-9X]"
&! string(added_lines) regex "[0-9]{15}[0-9X]"

But I don't know if it would prevent removal of a claim. Then one would have to test if string(added_lines) is not empty.

Matěj Suchánek (talkcontribs)

Okay, I changed the filter to be more restrictive, yet robust.

But I don't know if it would prevent removal of a claim. It wouldn't, there is a check for edit summary.

CV213 (talkcontribs)

Thank you, much better protection against changes away from correct format now. Not sure about "novalue" and "somevalue".

Insertions of format violating strings are still possible.

Matěj Suchánek (talkcontribs)

Not sure about "novalue" and "somevalue". The current regex ([0]{7}[0-9]{8}[0-9X]|) matches an empty string, this is how "somevalue/novalue allowed" is indicated.

CV213 (talkcontribs)

I changed the regex to ([0]{7}[0-9]{8}[0-9X]). Better they end up in the CV reports. I have seen some of these claims, but they had no qualifier or reference.

CV213 (talkcontribs)
Matěj Suchánek (talkcontribs)

I am not fully convinced we really need a filter because of that. (Imagine the report was generated after the bot run. Imagine they duplicated the statement right away, without spaces.)

But I gave it a try. What's strange, though, I was able to make it catch this edit, this edit, but not this edit. The filter reads the present, valid ISNI...

CV213 (talkcontribs)

The list of today in the CV report: https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P213&oldid=2074639676 - I would prefer to give the spaced ISNI inserter at least a warning. I didn't look into the items yet. The http://isni.org/isni/0000000097580195 I have seen already at User:DeltaBot/fixClaims/maintenance/P213format - not sure why DeltaBot lists the spaced ones.

The last item has the spaced ISNI because of mixnmatch, maybe some of the others too https://www.wikidata.org/w/index.php?title=Q124489180&oldid=2074561219. Reported to Magnus: Topic:Xyr5b6zcavxhideq. All other tools I know of are fixed now.

Teslaton (talkcontribs)

Hi Matěj. Regarding Special:AbuseLog/28575880: am I missing something? It seems to be perfectly valid ISNI code (https://isni.org/isni/0000000423486330) and I've tried both compact (0000000423486330) and goruped (0000 0004 2348 6330) format, both leading to a filter hit. Any idea?

(edit: ok, so it went away later eventually , although I'm not aware that I would have changed anything... :D)

Matěj Suchánek (talkcontribs)
Teslaton (talkcontribs)

Yeah, indeed, good point! It can't be seen in the rendered diff (and actually, at first glance, not much even in the dump itself... :D). Thanks.

Arlo Barnes (talkcontribs)

Could the warning link to this thread? Its not clear in the current text what is inappropriate about the with-spaces version.

Matěj Suchánek (talkcontribs)
Reply to "ISNI format abuse filter 110"
Pommée (talkcontribs)

In French please preserve (simple dames), (double dames), (simple messieurs) and (double messieurs). Pommée (overleg) 12:08, 10 February 2024 (UTC)

Reply to "MatSuBot: preserve labels"

Bot is adding Russian labels not written in Cyrillic

2
Summary by Koavf

Looks like it's conventional to not Cyrillicize Latin names in Russian. Thanks Ymblanter.

Koavf (talkcontribs)
Ymblanter (talkcontribs)

Russian is of course written in Cyrillic but some things including most names of music albums never get translated/transliterated and just appear in Latin. You can check that some articles of the Russian Wikipedia just have Latin names. (On the other hand, books and films usually get translated).