Property talk:P227/Archive 1

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Duplicate of P107

This property seems to be a duplicate of P107 (GND entity type), or why are there now two different properties? --#Reaper (talk) 13:04, 16 March 2013 (UTC)

Property:P107 lists the main types of item. Is the item a person, organization, event, work, term, place, or disambiguation page? (See Wikidata:Infoboxes task force for use.) It's a kind of basic classification. So Marilyn Monroe (GND 118583549) and Vladimir Putin (GND 122188926) are both "type person", but they have their own GND numbers as identifier. --Kolja21 (talk) 02:08, 17 March 2013 (UTC)
Ah, I haven't seen that this property is from type string, the description reads like if I/you should enter "name", "work" and so on, not the ID of the GND-object. Thx. --#Reaper (talk) 12:34, 17 March 2013 (UTC)

STICKY: Explanation of format constraints

At its launch in April 2012 the GND established all existing identification numbers of its constituent files (PND, GKD, SWD, DMA-EST) as GND identification numbers. Records created since then follow the pattern for the former PND. Caveat: The checksum algorithms differ between the dashed and undashed types. Another caveat: The dash is essential, both 160220440 and 16022044-0 are valid GND numbers, denominating distinct entities.

  1. (1|1[01])\d{7}[0-9X]: (9 digits starting with "1" or 10 digits starting with "10" where the last "digit" may be "X") Former PND numbers, and all numbers for genuinely "GND-born" records (those created after 2012-04, always 10-digit form)
  2. [47]\d{6}-\d: 7 digits starting with "4" or "7", followed by dash and a strictly numerical check digit: Former SWD numbers. Scheme discontinued after 2012-04.
  3. [1-9]\d{0,7}-[0-9X]: one to eight digits not starting with "0", followed by dash and a check "digit" which may be "X": Former GKD numbers. Scheme discontinued after 2012-04.
  4. 3\d{7}[0-9X]: 9 digits starting with "3", last "digit" may be "X": Former DMA-EST numbers. Scheme discontinued after 2012-04.

At the time being there is a certain overlap between the formulations 2. and 3. admitting false negatives. And of course the pattern check does not perform a checksum test. -- Gymel (talk) 11:43, 11 May 2013 (UTC)

STICKY: Uniqueness constraint: List of persons with known conflicts between GND and Wikipediae

Below is a list of persons from GND database, where different to Wikipediae (a) GND does not distinguish between two (possibly) different persons and thus has only one database entry (b) GND identifies a pseudonym or fictional author (persona) with its creator, although the persona is more than a simple pen name. These rare exceptions can lead to violations of uniqueness constraint. Please see de:Benutzer:Gymel/Hartnäckige_PND-Dubletten for further details (in German). -- Make (talk) last update: 12:40, 27 November 2013 (UTC)

1st group [identities historically identified with one person]
  1. http://d-nb.info/gnd/118557513 Jesus Christus
    = historical Jesus (Q51666) (historical Jesus of Nazareth) + Jesus (Q302) (central person of Christianity from/in New Testament)
  2. http://d-nb.info/gnd/118557815 Johannes
    = John the Apostle (Q44015) + John the Evangelist (Q328804)
    2nd group [identity is subject of ongoing debate]
  3. NOTYETDISCOVERED but WONTFIX http://d-nb.info/gnd/118720260 Hans von Tübingen
  4. NOTYETDISCOVERED but WONTFIX http://d-nb.info/gnd/118815245 Hans Hirtz
  5. WONTFIX http://d-nb.info/gnd/11936929X Meister von Meßkirch : Q568760 Q1532784
  6. WONTFIX http://d-nb.info/gnd/119457733 Arnold : Q535832 Q694744
  7. http://d-nb.info/gnd/118746871 Irmgard
    = legendary Irmgardis von Süchteln for whom worship as patron saint of town Süchteln (Q314425) is first documented at the end of 15th century (1486, 1498) +(?) Irmgard von Köln + historical persons Irmtrudis, Irmgardis, ... who are documented as donors to the church in 11th century
    = (badly disambiguated) Saint Irmgardis (Q444949) + Saint Irmgardis (Q14540331)
    see www.rheinische-geschichte.lvr.de/persoenlichkeiten/I/Seiten/IrmgardisvonSüchteln.aspx (deutsch)
    3rd group [pseudonyms]
  8. http://d-nb.info/gnd/118677799 Lemony Snicket
    = Daniel Handler (Q1060636) (novelist, born 1970) creator of → Lemony Snicket (Q458346) (fictional person providing his pen name)
  9. http://d-nb.info/gnd/126472009 Bonifatius Kiesewetter
    = Waldemar Dyhrenfurth (Q1307672) (German jurist and author, 1849-1899) creator of → Bonifazius Kiesewetter (Q892566) (fictional person providing his pen name, later use by other authors)
  10. http://d-nb.info/gnd/115646108 Jason Dark
    = Helmut Rellergerd (Q1604049) (German writer, born 1945) creator of → Jason Dark (Q104029) (pen name used by different authors of publisher "Bastei", Rellergerd later was granted exclusive use of the pseudonym)
  11. http://d-nb.info/gnd/123068908
    = Kurt Ostbahn (Q584872) + Willi Resetarits (Q43776)
    group ? [unclear what is going on,work in progress]
  12. http://d-nb.info/gnd/118691910 Lucius Annaeus Florus ←→ http://d-nb.info/gnd/100136907 Florus ←→ http://d-nb.info/gnd/119410672 Florus

Leading or trailing space characters in values (resolved)

text separated here into a standalone section for archieving purposes -- Make (talk) 22:58, 26 May 2013 (UTC)

Wikidata:Database reports/Constraint violations/P227: Some of the numbers are correct. A helpful rule would be: "Only numbers starting with 1-9 (not 0) and dashes are allowed." Can someone translate this into format pattern? --Kolja21 (talk) 16:54, 18 May 2013 (UTC)
Examples from the list:
Both GND's are correct. --Kolja21 (talk) 16:58, 18 May 2013 (UTC)
I think I found out what went wrong: the tsring values contain either leading or trailing space characters. Unfortunately this is not visible on the item page or in the constraint violation report. Only if you look at the wikitext source of the report you can see the mistakes as %20 in URLs. To correct this on an item page, I had to use a 2-step somewhat hacker-like approach (since the erroneous space characters are not visible): click edit (value), add a space character at the start and at the end of the value string, click save, click edit, remove the extra characters just added, click save. But we have to wait for the next report to be sure this really works ... -- 22:41, 18 May 2013 (UTC) User:Make -- minor edits for clarity 08:07, 21 May 2013 (UTC)
And, has it worked? Three examples from the current list (19:35, 20. Mai 2013‎):
All three GND's are correct, but have a trailing space in the list. --Kolja21 (talk) 22:44, 20 May 2013 (UTC)
Looks like it really worked. On May 19th, I removed space characters (with the hacker-technique described above) from value strings for Q2066, Q124696, Q71154, Q70938, Q30917, Q11143, and Q11021. None of these items show up as "format violations" in the current report anymore. -- Make (talk) 08:07, 21 May 2013 (UTC)
I just removed the trailing space from Q1135083 see revision history for change in Bytes. You might want to try fixing some string values yourself to confirm that although the presentation on the item page is identical before and after, from the revision history you can see that indeed a character was removed. – I am not sure what to think of this. At least it is unfortunate that the presentation on the item page omits some content (namely leading/trailing space characters). Maybe there is a software bug with string input/printing behind this. -- Make (talk) 08:22, 21 May 2013 (UTC)
I left a note at Wikidata:Contact the development team#Trailing space. --Kolja21 (talk) 13:07, 21 May 2013 (UTC)
Yes, sorry, that's my fault. The way we added trimming is a bit hacky, which leads exactly to the issue you describe here. It will be improved, but this is probably a month or two down the line. The good news is: this kind of errors should be impossible to introduce anew. So there is only some legacy error. I am not sure, it could even be possible that pages that get edited at all loose this kind of legacy error, because the whole content gets changed, but as said, I am not sure. Whatever, in order to help here, I made an analysis, and tried to figure out a list of all places where this problem occurs. It seems to be in 148 values, listed in the following (item, property, value). I hope this helps, and again, sorry for my mistake! I anticipated it, but checked only for linebreaks, and fixed those manually before the patch, but not for simple whitespaces. --Denny (talk) 15:28, 21 May 2013 (UTC)
Thanks@all for helping to resolve this. I just finished fixing all %20 in claims for P227. Hope I didn't miss any. We'll see if all is good when the bot-update scheduled for the early hours of May 24th brings its findings. --- Make (talk) 20:36, 22 May 2013 (UTC)
✓ Done Finally all values with leading/trailing spaces are fixed. -- Make (talk) 22:58, 26 May 2013 (UTC)

Duplicates

Please use de:WP:GND/F to report duplicates. See de:Hilfe:GND#Personen for the difference between individualized and non-individualized (VIAF: "undifferentiated" = don't use) GNDs. --Kolja21 (talk) 14:15, 11 October 2013 (UTC)

I just merged two items and the target now has two GNDs [1]. Maybe a duplicate report can be generated via SPARQL? MrProperLawAndOrder (talk) 18:13, 11 May 2020 (UTC)
✓ Done Giacomo Rho, see de:Wikipedia:GND/Fehlermeldung/Mai 2020. --Kolja21 (talk) 20:06, 11 May 2020 (UTC)
list of duplicates with type human https://w.wiki/QZf : 993 results. I deprecated one GND for Luiza Gagut [2] and gave as reason "name" (I saw this somewhere else). @Kolja21: we have so limited manpower, could a bot downrank the names, if another "real" GND exists to have this list of duplicates shorter? Or downrank all name-GND. MrProperLawAndOrder (talk) 19:14, 11 May 2020 (UTC)
@MrProperLawAndOrder: Placeholders (see Help:P227) are deleted already since five years by bots, see Wikidata:WikiProject Authority control/Tn. Unfortunately VIAF imported them, but most of the placeholders have since been deleted. GND will delete them as well. They date back to the 90s when only the German National Library issued authority data. --Kolja21 (talk) 20:24, 11 May 2020 (UTC)

Database reports/Constraint violations : GND identifier present but VIAF identifier missing

Hi! This might be a new type of property constraint violations.
Is it possible to list all pages where GND ID (P227) is present but VIAF ID (P214) is missing? Regards לערי ריינהארט (talk) 06:45, 20 October 2013 (UTC)

Thanks for the answers! In order to have fewer results one should limit the query to Wikidata pages that are linked to a specific language:

  1. having an article in yi.Wikipedia
  2. having an article in eo.Wikipedia
  3. having an article in ro.Wikipedia

Thanks for any answer! לערי ריינהארט (talk) 09:03, 21 October 2013 (UTC)

Magnus Manskes's tool would not work properly if no English label is present.
How can you query all Wikidata pages having Library of Congress authority ID (P244) without English label?
לערי ריינהארט (talk) 09:13, 21 October 2013 (UTC)
Looks like not all items with GND ID have VIAF ID. For example I am failed to find VIAF ID for Bieszczady Mountains (Q125529), Rheinbach (Q12547), Age of Enlightenment (Q12539). — Ivan A. Krestinin (talk) 18:57, 26 October 2013 (UTC)
VIAF has problems with hyphens and changed Bieszczady Mountains (Q125529) GND 4006552-2 into VIAF-GND 040065529. But there are also numbers missing. RERO is incomplete and GND numbers of May 2012, when PND changed to GND, were apparently not reported. Expample: Samuel Ramos (Q7412445) GND 1022446479 (16-05-12). VIAF 59099151 has today, one and a half years later, still an outdated "undifferentiated" Tn linked to the Mexican writer. --Kolja21 (talk) 21:51, 27 October 2013 (UTC)
The "problems" VIAF has are that it (partially) confuses GND Ids with DNB Ids. Thus GND 4006552-2 cannot be resolved by "sourceID" but if you access VIAF 245618932 it links to the correct GND record, but it displays the wrong ID. -- : Gymel (talk) 00:39, 28 October 2013 (UTC)

Thanks for the answer! לערי ריינהארט (talk)

(GND identifier present AND its values has length 9 AND its value starts with (1 OR 2)) AND VIAF identifier missing

How many are these? Let's restrict to:

GND value matches pattern=[12]\d{7}[0-9X] AND VIAF is (empty OR NIL) 

i.e. no MINUS is present in GND value AND ... . Normally you should be able to identify the correlated VIAF id with a link as [3]. Note: The example contains a trailing X.
Note: https://viaf.org/viaf/search?query=cql.any+all+%22000423580%22+and+local.sources+any+%22dnb%22&sortKeys=holdingscount can identify via normalized GND 000423580 (The - is removed from 42358-0 . Then heading ZERO's are added to get a string of lemght nine.). This does not work in general. לערי ריינהארט (talk) 04:23, 6 November 2013 (UTC)

add GND identifier format constraint violations/P227 : values of length 9 never ever start with a digit different then 1 or 2

Hi! GND identifier vales the values never start with 0 or 9. If present here this is due to a bug of the AC tool. לערי ריינהארט (talk) 03:23, 1 November 2013 (UTC)

Update: GND identifier vales of length 9 never ever start with a digit different then 1 or 2. לערי ריינהארט (talk) 03:31, 6 November 2013 (UTC)
The current format constraint gives this regular expression:
|((1|10)\d{7}[0-9X]|[47]\d{6}-\d|[1-9]\d{0,7}-[0-9X]|3\d{7}[0-9X])
which is in fact redundant (the 3 unnecessary alternatives for the initial 1, [47], or 3 are already part of the alternative for the initial [1-9]) and fully equivalent to:
|(([1-9]|10)\d{7}[0-9X])
(Note: the initial "|" of both regexps indicates that the value may be empty, for meaning "no VIAF identifier currently exists for this topic", or "the VIAF identifier has been obsoleted/deprecated/removed" probably because the topic was ambiguous and did not identify really a single topic; the empty value for this property can be only set in Wikidata, provided you also add a qualifier such as "comment":"no value" and probably a "date" qualifier for this asserted comment; without the necessary qualifier(s) the property would be simply deleted from the item in Wikidata)
But what you are saying is that: if the identifier starts by 1 or 2, then it cannot have length 9 and would have length 10. This would give the following:
|([3-9]|([12]\d))\d{7}[0-9X])
Is that correct ? Can you point us to a reliable source (the DNB reference page explaining it)? Verdy p (talk) 19:47, 4 March 2016 (UTC)
I also note that GND identifiers of persons (starting by 1, and with 10 digits/letters), do not have any minus-hyphen sign before the last check digit in [0-9X], in order to preserve the maximum length of the full id to at most 11 characters (only IDs starting by 4 or 7 also have variable lengths, and all IDs except those starting by 1 or 2 accept the minus-hyphen as they have a maximum of 9 digits/letters). The actual format would be then more accurately:
|([12]\d{8}|[35689]\d{0,7}-|[47]\d{6}-)[0-9X]
Adding the minus-hyphen in the "long" GND ID of a person (staring by 1) causes the ID to be misinterpreted (with the last check digit discarded due to excessive length) and can sometime bring us to another unrelated authority record).
Beside that, it seems that the minus-hyphen is now optional in "short" IDs (those starting by [3-9])
|([12]\d{8}|[35689]\d{0,7}-?|[47]\d{6}-?)[0-9X]
Leading zeroes (just after the required leading [3-9] "class digit") for "short" IDs must apparently be discarded. I don't know if this is true for those starting by the required [47] "class digit", but it is strange that they accept less digits than others. If so we would get more simply:
|([12]\d{8}|[3-9][1-9]\d{0,7}-?)[0-9X]
(where the first alternative is for "long" IDs where zeroes after the "class digit" are not discarded and where the minus-hyphen must NOT be specified, the second alternative is for all "short" IDs that may include the minus-hyphen before the "check digit" and may have leading zeroes after the "class digit").
Verdy p (talk) 20:54, 4 March 2016 (UTC)
Further checking in the database with SPARQL, I found that Wikidata internally stores some GND identifiers by terminating them by an additional right-to-left mark (U+200F), even if they are not visible in the editing interface.
Then they don't match the regular exception. I think this is a bug in the Wikidata editing interface (which can insert them automatically when submitting to the database), or in its local implementation of SPARQL...
So I looked for other querying interfaces, I found that the RLM are effectively present... but not displayed in the database.
There are 9 occurences :
  • Q1032, GND identifier = "1018704-2<RLM>" (strlen=10 instead of 9)
  • Q7054, GND identifier = "4727207-7<RLM>" (strlen=10 instead of 9)
  • Q42108, GND identifier = "7678885-4‏<RLM>" (strlen=10 instead of 9)
  • Q21165243, GND identifier = "1058485881<RLM>" (strlen=11 instead of 10)
  • Q21638355, GND identifier = "1026070740<RLM>" (strlen=11 instead of 10)
  • Q21823350, GND identifier = "138778485<RLM>" (strlen=10 instead of 9)
  • Q21849794, GND identifier = "135929210<RLM>" (strlen=10 instead of 9)
  • Q22967417, GND identifier = "124887171<RLM>" (strlen=10 instead of 9)
  • Q22920213, GND identifier = "132592746‏<RLM>" (strlen=10 instead of 9)
Can a wikidata admin look at what is wrong there ? I tried to edit these identifiers, but they are displayed correctly in these pages, and the links also work correctly (none of them show the RLM). Removing them prior to adding them again (typing them manually to make sure there's no RLM in a copy-paste operation) does not change the result. I fear that this could affect many other item properties (or translated labels) in Wikidata.
Verdy p (talk) 23:23, 4 March 2016 (UTC)
Verdy p Sorry to say, but most of what you say is utter nonsense. Why don't you just read #STICKY: Explanation of format constraints above before speculating? -- Gymel (talk) 23:15, 4 March 2016 (UTC)
Non-sense ? Look more precisely... (http://tinyurl.com/hnarckw link to SPARQL query) You'll see I'm correct here. Those RLM are there and returned by SPARQL which does not match the expected regexps (unless I add "\u200F?" in the regexps !). Verdy p (talk) 23:25, 4 March 2016 (UTC)
Obviously at 23:15 I did not comment on your contribution from 23:23 but on what you had been writing above. -- Gymel (talk) 22:28, 5 March 2016 (UTC)
I looked at the history of those items, I saw that RLM were initially added the first time, then removed later (you can see that in diffs). Apparently, SPARQL does not consider the last version of a property, but randomy uses any version found (possibly in a cache, but this cache is extremely long to expire and get purged from its LRU list...) and stops there. In other words, SPARQL does not reflect the current state of the database (even if it indicates that it has all updates since the last one or two hours: these corrections were made long before). This may also explain why we see old data' everywhere when navigating in Wikidata.
Something is wrong in the management of internal cache for WDQS... or in how it retrieves the data (probably a filter of items by their most recent version is missing when looking for properties of an item). This not only affects interactive queries, but also the navigation on the website (old data displayed including on the Wikidata website itself), and also all data extractions (export as RDF, Turtle, etc.). Verdy p (talk) 00:00, 5 March 2016 (UTC)

Unique value constraint and gender specific values

Hi! @Gymel , @Kolja21 There are a lot of gender specific pairs:

What are the impacts of this:

a) for Wikidata
b) for the authority control templates using one parameter value only
1) Is there a symmetrical counterpart of field of this occupation (P425) ?
2) Are part of (P361) and has part(s) (P527) to be used at pages as philology (Q40634) ? see: DNB search: Philologie

לערי ריינהארט (talk) 07:41, 12 March 2014 (UTC)

First of all, even when there is a process of identification between wikidata entries, wikipedia articles and GND records, there is no necessity to transport the relations between these objects between the different systems.
Second: Your examples illustrate the issue that it has probably not been very wise to extend the Normdaten/Authority-control templates from persons to concepts: For the latter in many cases it would have been more appropriate to supply wikipedia categories (instead of articles) with the identification with GND concepts
More specific: Since the GND as a 'document language' knows about the female terms of most professions and the German Wikipedia does not (uses redirects to the generic masculine) and other Wikipedias also don't (e.g. because their respective languages don't have female forms for many nouns) and Wikidata does not (yet) even cover wikipedia redirect pages there certainly is an "impact" everywhere.
For reasons not clear to me librarians tend to view everything in terms of part-whole relations. Wikidata objects however are instances or classes and their relations are recorded by distinct properties, cf. Help:Basic membership properties. Specifically philology (Q40634) relates by "subclass of" to sub- and superordinate concepts.
For GND persons (and to a lesser extent for corporate bodies) a "field of activity" has traditionally been of interest. How this was modeled in the data however has undergone several changes. IIRC currently the first profession given should be very broad and give an indication of the "field of activity". However the GND makes no attempt to assign a broad "field" to individual professions (but the underlying DDC-like "GND classification" may serve a similar purpose). -- Gymel (talk) 10:16, 12 March 2014 (UTC)
Thanks for the answer! These days I have seen some (broken) {{PLURAL:foo}} wiki code in the help for "aliases". {{GENDER:foo}} might come also.
Personally I think as a practical "workaround" one should only import / add male forms of GND authority identifiers. Not sure if beside actor (Q33999) there are more gender forms in English (except the girl / boy, sister / brother etc. pages. לערי ריינהארט (talk) 13:00, 12 March 2014 (UTC)
Quoting a recent contribution in a librarian's discussion list [4]: "Actor" is almost a unique case. [...] I’ve met females who act who call themselves actresses, and I’ve met those who see the term as demeaning and prefer actor. Also note "actor / actress" as the english label for actor (Q33999). Thus:
  • items are usually neutral or include all genders
  • since there exists a dedicated property sex or gender (P21), all other properties should be choosen "abstracted from gender", i.e. should take a neutral item as value (because of the previous point there is seldom a choice at all) to achieve "orthogonality"
For non-items or "external items" as in authority control it is probably the best to adapt that strategy: stick to the generic masculine and completely ignore female forms since they usually are not an alternative but a specialization / "narrower term" and therefore as match not as close as the male form. -- Gymel (talk) 22:33, 12 March 2014 (UTC)
Hmm - I agree that it makes sense to use gender-agnostic items in Wikidata. However, GND took another course. They definitivly do not use the female form as a specialization, but as an alternative - so on their side no "generic masculine" exists. Ignoring this has negative consequences for applications, which use the mapping: Using Wikidata as starting point and searching for persons described by a "male" GND profession, the application will miss the females. Using GND as a starting point, no mapping exists for the female form.
So what exactly are the problems in using an approach which maps a Wikidata item to two alternative GND targets, qualified by gender? (example in social scientist (Q15319501)) Jneubert (talk) 18:30, 13 December 2015 (UTC)
Jneubert Quite an old thread you have been commenting on... I think for GND one has to regard several sub-applications:
  1. "Als Homonymenzusatz bei Personenschlagwörtern zugelassen" means that the female form as a text fragment is permissible in constructing headings
  2. As subject heading when cataloging objects: Appropriate if and only if sex or gender aspects of specifically woman scientists are in the focus of the publication. This is clearly much narrower in scope than the male term (which is also to be applied if sex or gender questions are not involved)
  3. As indication of profession for other GND records (i.e. persons): I just checked, for the profession data element indeed chooses the "female" profession for female persons. How one will ever be able to select all social scientists with that approach unfortunately eludes me: The reciprocal relation between the two professions is quite unspecific.
So by listing both GND numbers in social scientist (Q15319501) we equate the two for our purposes (which is correct in a sense) since P227 indicates a 1:1 correspondence between Wikidata item and target record. On the other hand we can argue that there is a distinction between the two concepts and Wikidata is just unable (or unwilling) to represent one of them. Or perhaps the Wikidata item represents the union of the two and therefore none of them is appropriate for P227 of our item? -- Gymel (talk) 00:25, 14 December 2015 (UTC)
Note: social scientist (Q15319501) is for male and female like "Sozialwissenschaftler", GND 4140123-2 is used for males and females. Proof: Kristen Kreider, weiblich, Sozialwissenschaftler (GND 1068218916). --Kolja21 (talk) 13:51, 17 December 2015 (UTC)

constraint report relating to Wikimedia disambiguation page (Q4167410)

If instance of (P31) is a Wikimedia disambiguation page (Q4167410) the presence of GND ID (P227) is prohibited

see: Wikimedia disambiguation page (Q4167410) with GND identifier (P227) . Usualy there might be more possibilities:
a) Please identify the non - ambiguation page (WD item) where the property GND identifier should be moved;
"normally" no other statements should be left at the disambiguation page.
It can happen that a set of properties should be moved to another (a second) WD item, another set to a third WD item etc.
b) (recomended method:) Verify which language is a disambiguation page and separate it from the rest. Please use Gadget-labelLister.js can be activated at preferences#gadgets to remove all faulty (disambiguation) descriptions after the disambiguation page is separated: Verify the descriptions for the following languages: de, en, fr, es, pt, pt-br, ru, sv which where added by bots long time ago and take a short look at the other language descriptions.
c) If method b) can not be used because all linked WMF-project language pages are (local) disambiguation pages please create a new WD item and add a proper description in your native language and in English.

Thanks in advance! gangLeri לערי ריינהארט (talk) 13:29, 27 May 2014 (UTC)

Three items seem to persist at Wikidata:Database reports/Constraint violations/P227#.22Conflicts_with.22_violations and IMHO cannot be resolved here without at least some intervention in individual wikipedias:
  1. treaty of the European Union (Q11122): some Wikipedias seem to understand this as the treaty of Maastricht with its later amendments (treaty of Lisbon, ...), some others however also include the treaty of Rome (with its companion treaties) under this heading and therefore tend to declare themselves as disambiguation pages.
  2. asymmetry (Q752641): disambiguation property declared by french wikipedia by an article already narrowed to "geometrical" aspects. Comparing the extent of the articles in the other wikipedias there is not much common ground anyway.
  3. list of wars involving Israel (Q623900): Although German wikipedia enumerates the individual conflicts I would not consider it a typical list article. Separation from the other wikipedias however would not be an improvement I think. -- Gymel (talk) 08:26, 4 June 2014 (UTC)

Sorry, to late (I didn't read this talk page in the morning), these three cases have been already resolved. list of wars involving Israel (Q623900) = is a list of (P360) Arab–Israeli Wars (Q17126147). The German article looks more like a list but the Arabic WP has a "real" article about this topic. --Kolja21 (talk) 00:59, 5 June 2014 (UTC)

What's type n?

Hi,
I wanted to add a GND identifier to an item (actually I did, see Gunnar Wöbke (Q1554803)), but the description here says "(please don't use type n = name, disambiguation)". This confuses me, what is meant with "type n = name, disambiguation"? From the examples it looks like type n is where it says in the RDF representation: "<rdf:type rdf:resource="http://d-nb.info/standards/elementset/gnd#UndifferentiatedPerson"/>"

For "real" persons it would say "DifferentiatedPerson" there, is that meant with not-type n? --Bthfan (talk) 22:11, 7 July 2014 (UTC)

@Bthfan: Exactly. The record stands for an unknown number of individuals known by that name and therefore cannot be used to identify a single person. -- Gymel (talk) 08:09, 10 July 2014 (UTC)

Isn't type n appropriate for Wikimedia disambiguation pages?

I quite agree that differentiated GNDs must not be used on "Wikimedia disambiguation page (Q4167410), Wikimedia category page (Q4167836), Wikimedia list article (Q13406463)".

However, undifferentiated GNDs (type=n) quite closely match Wikimedia disambiguation pages. Does the above prohibition mean that we DON'T want such GNDs in wikidata at all? --Vladimir Alexiev (talk) 07:53, 19 December 2014 (UTC)

de:Peter Müller would be an example: The disambiguation page lists some Peter-Paul and Peter Erasmus Müller who perhaps actually never were called "Peter" (the undifferentiated record would never tell, since "Peter Erasmus" is definitely not a form of reference for all Peter Müllers it is or should be prohibited). The set of persons in the disambiguation page may have an nonempty intersection with those meant by an authority record, but usually is neither a subset or superset: We may know about some Peter Müllers they don't know about (and perhaps never will) and vice versa. Furthermore, the disambiguation page lists the individuals, we know about individually (with the exception of redlinks perhaps), whereas the undiffereniated name records stands for the complementary set of (names of) persons the libraries do not know individually (at the moment), so usually you cannot navigate to (library items already assigned to) individual authority records by accessing the undifferentiaded record. Thus I would think both concepts have in common that they somehow deal with a group of people with common formal characteristics (their name), but the purpose and definition for this clustering generally do not have much in common. -- Gymel (talk) 09:46, 19 December 2014 (UTC)
@Vladimir Alexiev: Yes, we don't want such GNDs in Wikidata at all. Tn's are temporary numbers. They are placeholders and will be deleted or (this was common till 2012, now with so many libraries and archives taking part in the project it's rare) individualized and turned into a Tp. --Kolja21 (talk) 11:21, 19 December 2014 (UTC)

P31:Q5 and Type N

Hello everyone,

at the request of Kolja21 my bot fetched a list of (for the beginning a few) entries with instance of (P31)human (Q5) and Type N GND ID (P227) information. You can find this list in the bot's user namespace User:KasparBot/GND Type N.

Regards, -- T.seppelt (talk) 06:24, 19 September 2015 (UTC)

SSL

Currently the formatter url uses http and redirects to a https connection. Is there a reason not to link the target directly? --- Jura 11:47, 3 December 2015 (UTC)

"Funktioniert nur eingeschränkt. Fehlermeldung Firefox: 'Dieser Verbindung wird nicht vertraut.'" = https makes problems with at least some browsers. It produces error messages. We had a simular discussion @User talk:Pasleim#Property:P1630 and a revert with the reason: "weak certificates". --Kolja21 (talk) 03:04, 5 December 2015 (UTC)
It's different. New formatter URL would be https://portal.dnb.de/opac.htm?method=simpleSearch&cqlMode=true&query=idn%3D$1
It works quite well. --- Jura 06:47, 5 December 2015 (UTC)
I think we should stick to the canonical URLs employing d-nb.info as advertized by "Link auf diesen Datensatz" (link to this record) and not substitute that by some search query for the identifier in the Library catalogue portal.dnb.de, even if that can be performed by https. -- Gymel (talk) 20:21, 6 December 2015 (UTC)
It's not "some search query", but the https-URL preferred by d-nb.info and where it redirects. It just saves users a non-SSL step in the chain. --- Jura 11:23, 7 December 2015 (UTC)
http://d-nb.info/gnd/2072525-5 redirects to https://portal.dnb.de/opac.htm?method=simpleSearch&cqlMode=true&query=idn%3D007223358. So it's a different domain and a different kind of identifier (ILTIS PPN vs. GND Id). Indeed https://portal.dnb.de/opac.htm?method=simpleSearch&cqlMode=true&query=idn%3D2072525-5 also gives the same result but IMHO this may change any time without notice (may I repeat that http://d-nb.info/gnd/2072525-5 is the only form advertised by the record itself). So if you insist on providing https URLs resorting to letting the identifers point to https://archive.org or https://google.com seems a safer bet than trying to outsmart data providers. -- Gymel (talk) 16:44, 7 December 2015 (UTC)
The best to do is to contact "d-nb.info" admins asking them to setup an equivalent HTTPS server in their domain.
They will still continue to redirect us to another HTTPS URL in another domain, which also uses another identifier.
So we would query "https://d-nb.info/gnd/2072525-5<nowiki>" with exactly the same response as "<nowiki>http://d-nb.info/gnd/2072525-5", but this time its resulting redirect would be secured (and not spoofable).
This could also secure us if we create in Wikimedia lots of links to d-nb.info via the GND identifiers: at least the resulting redirect to another domain would be less suspect (imagine that someone spoofs queries to the existing HTTP server or monitors it for unfair actions, by harvesting some routers...
Some wellknown ISPs are also unfairly redirecting some wellknown unsecured domains or modify the contents returned to include advertizing, or that are redirecting their users to an unrelated website....
With HTTPS we would know that we are effectively querying the actual webserver for the domain "d-nb.info" and not a malicious one.
An unfair ISP will not be able to redirect an HTTPS server: as soon as the SSL session starts being negociated, the ISP cannot spoof the authentic response (using fake/stolen SSL certificates?), it can only:
  • route the unmodified query to the real site and return unmodified results, or
  • block immediately the connection (returning an HTTPS error status), or
  • reroute the HTTPS query to another HTTP or HTTPS domain of his choice (using another certificate for this domain), such as a "parking page" or a "web search helper" page displaying various ads.
Explain this risk of website spoofing to d-nb.info admins: may be they will also setup HTTPS on their existing website, and it will be the preferred URL we will use here. All websites with large audience and contents related to lots of topics should use now SSS (See the campaign "HTTPS everywhere" that Wikimedia also strongly supports, notably since the "Snowden revelations" about NSA activities, including large-scale spoofing/monitoring of various wellknown websites). This should be the case of such reputed knowledge authorities (spoofing its website could be profitable and damaging to people, if this is used to get and verify details about someone listed in GND). Verdy p (talk) 20:19, 4 March 2016 (UTC)
The url suggestedmentioned by Gymel would probably work best as an intermediary solution.
--- Jura 06:35, 5 March 2016 (UTC)
Actually I suggested using Google for looking up GND identifiers if using https connections has top priority over all other concerns. -- Gymel (talk) 06:59, 5 March 2016 (UTC)
Sorry, changed it to "mentioned by Gymel".
--- Jura 07:01, 5 March 2016 (UTC)
@Gymel, Jura1: The following may be of interest. As of now, it appears that a URL of the format http://d-nb.info/gnd/$1 generates an 301 Moved Permanently redirect to a URL of the format https://d-nb.info/gnd/$1. The HTTPS URL has the same domain and identifier. For example, http://d-nb.info/gnd/2072525-5 generates a 301 Moved Permanently redirect to https://d-nb.info/gnd/2072525-5 which generates a 303 See Other redirect to https://d-nb.info/gnd/2072525-5/about/html which generates a 302 Found redirect to https://portal.dnb.de/opac.htm?method=simpleSearch&cqlMode=true&query=idn%3D007223358.
In addition, regarding third-party formatter URLs, it appears that a URL of the format http://opacplus.bsb-muenchen.de/search?pnd=$1 generates a 301 redirect to a URL of the format https://opacplus.bsb-muenchen.de/search?pnd=$1. For example, http://opacplus.bsb-muenchen.de/search?pnd=118584596 generates a 301 redirect to https://opacplus.bsb-muenchen.de/search?pnd=118584596 which generates a 302 redirect to https://opacplus.bsb-muenchen.de/metaopac/search?pnd=118584596.
--Elegie (talk) 10:58, 21 November 2017 (UTC)
Indeed, https://www.ssllabs.com/ssltest/analyze.html?d=d-nb.info looks encouraging. And Jura1 already has entered the https formatter URL as being of preferred rank. Whereas the catalogue itself still advertises the http URL as _the_ "link to this record" and will probably continue to for a very long time, since this URL also happens to coincide with the semantic web URI of the RW entity). -- Gymel (talk) 20:28, 21 November 2017 (UTC)
@Gymel, Jura1: With regard to the third-party formatter URL http://opacplus.bsb-muenchen.de/search?pnd=$1 for the Bavarian State Library, would it be possible to change the URL to use HTTPS instead of HTTP or to add an entry for the HTTPS version of the URL (see my previous comment about the URL redirecting)? Thanks. --Elegie (talk) 10:16, 22 November 2017 (UTC)
Elegie I have no opinion here because I have no idea why of all the hundreds of websites you can feed selected GND numbers into and get some meaningful result, the BSB catalogue was choosen (does support only the GND identifiers for persons, usually does not provide information about that person). This property feels to mike like an attempt to generate en:Special:BookSources automatically from Property:P212? -- Gymel (talk) 18:33, 23 November 2017 (UTC)

Identical GND ID

New report by KrBot: Wikidata:Database reports/Identical GND ID. --Kolja21 (talk) 17:08, 7 December 2015 (UTC)

portal.dnb.de gives an error

For now i can formater URL gives:

Leider ist ein Fehler aufgetreten.
Unable to invoke request

@Gymel: Can we fix it? -- Sergey kudryavtsev (talk) 07:20, 15 May 2016 (UTC)

Sergey kudryavtsev No, their catalogue (including other services) seems to be completely offline since some time last night. Let's just hope they'll notice it before Tuesday (tomorrow is an holiday in Germany). -- Gymel (talk) 08:35, 15 May 2016 (UTC)

@Gymel: The portal.dnb.de working for now! -- Sergey kudryavtsev (talk) 13:24, 15 May 2016 (UTC)

@Sergey kudryavtsev, Gymel: The server is working, but some of the new GNDs added this weekend are missing.
Example: Jamala (Q2662517), GND 1100357025 "404 NOT FOUND" - PICA works fine
BTW: Monday is a national holiday, correct, but that's no reason to work on Tuesday ;) Tuesday is a local holiday called Wäldchestag (Q1605844) (Day of the forest). --Kolja21 (talk) 19:54, 15 May 2016 (UTC)
@Kolja21: Oh, Eurovision contest... ;-) -- Sergey kudryavtsev (talk) 06:42, 17 May 2016 (UTC)
PS: DNB already knowns Jamala (Q2662517) «gewann für die Ukraine den Eurovision Song Contest 2016 in Stockholm». -- Sergey kudryavtsev (talk) 06:50, 17 May 2016 (UTC)
@Sergey kudryavtsev: How comes that User:Kolja21 could know about the existence of 1100357025 when it was neither queryable nor visible? Hint: de:Portal:Bibliothek, Information, Dokumentation/Normdaten/GND-Kooperation ;-) -- Gymel (talk) 18:35, 17 May 2016 (UTC)

✓ Done Server is up again and running without errors. All new GNDs are online. --Kolja21 (talk) 11:41, 16 May 2016 (UTC)

GND ID count by GND Ontology class

as of 2017 - see below for updates

gndoClass gndCount wdCount percentage
UndifferentiatedPerson 6653723 6244 0.1 %
DifferentiatedPerson 4250785 402239 9.5 %
CorporateBody 1383591 24381 1.8 %
ConferenceOrEvent 655032 489 0.1 %
TerritorialCorporateBodyOrAdministrativeUnit 184343 33180 18.0 %
MusicalWork 162166 1836 1.1 %
Work 139868 5574 4.0 %
SubjectHeadingSensoStricto 135435 17131 12.6 %
SeriesOfConferenceOrEvent 124043 304 0.2 %
OrganOfCorporateBody 110727 1430 1.3 %
BuildingOrMemorial 63958 3713 5.8 %
NomenclatureInBiologyOrChemistry 30665 2036 6.6 %
PlaceOrGeographicName 24996 1134 4.5 %
Family 18924 914 4.8 %
NaturalGeographicUnit 17589 2577 14.7 %
AdministrativeUnit 11826 2317 19.6 %
SubjectHeading 8788 94 1.1 %
SoftwareProduct 7997 376 4.7 %
Language 5703 605 10.6 %
ProductNameOrBrandName 5548 356 6.4 %
HistoricSingleEventOrEra 5183 508 9.8 %
WayBorderOrLine 4583 443 9.7 %
Manuscript 4420 117 2.6 %
EthnographicName 4186 239 5.7 %
RoyalOrMemberOfARoyalHouse 2963 2011 67.9 %
VersionOfAMusicalWork 2950 4 0.1 %
ReligiousTerritory 2847 130 4.6 %
ProvenanceCharacteristic 2024 0 0.0 %
CharactersOrMorphemes 1936 21 1.1 %
NameOfSmallGeographicUnitLyingWithinAnotherGeographicUnit 1821 169 9.3 %
MeansOfTransportWithIndividualName 1372 130 9.5 %
ProjectOrProgram 1350 20 1.5 %
LiteraryOrLegendaryCharacter 1313 491 37.4 %
CollectiveManuscript 1264 3 0.2 %
Collection 1068 3 0.3 %
MemberState 579 297 51.3 %
Gods 546 440 80.6 %
CollectivePseudonym 513 49 9.6 %
GroupOfPersons 344 30 8.7 %
Country 327 238 72.8 %
ExtraterrestrialTerritory 261 128 49.0 %
Spirits 99 7 7.1 %
FictivePlace 28 3 10.7 %
FictiveCorporateBody 18 0 0.0 %
FictiveTerm 1 0 0.0 %

As of 2017-10-08 (WDQS) and 2017-08 (GND dump). Jneubert (talk) 09:15, 8 October 2017 (UTC)

Thank you for the list. BTW: UndifferentiatedPerson (Tn) should not be used. These numbers are only placeholders similar to disambiguation pages, see User:KasparBot/GND_Type_N. --Kolja21 (talk) 15:33, 29 May 2018 (UTC)

Update: As of 2018-07-27 (WDQS) and 1806 (GND dump) -- Jneubert (talk) 06:21, 30 July 2018 (UTC)

The results were obtained by this federated SPARQL query on our experimental GND endpoint. Jneubert (talk) 12:23, 18 October 2018 (UTC)

Update: As of 2018-11-21 (WDQS) and 2018-10 (GND dump) -- Jneubert (talk) 12:34, 21 November 2018 (UTC)

Update: As of 2019-11-19 (WDQS) and 2019-10 (GND dump) --Jneubert (talk) 07:02, 20 November 2019 (UTC)

Fixing "single value constraint" violation by indicating time span

There are several entries in Wikidata that have two corresponding entries in GND, e.g. administrative areas that were incorporated by another administrative area have two GND entries – one before and one after incorporation – but in Wikidata they usually only have one entry. Often it is made clear in the Wikidata entry when the status/type of the item changed (which correlates with the minting of a second GND ID). See these three examples: Eilendorf (Q1304021), Rodenkirchen (Q885372), Roman Catholic Archdiocese of Paderborn (Q253765). I have thought about fixing violations in such entries by using start time (P580) and end time (P582) as separators. But in my understanding, these would refer to the time span when one identifier was used for the entity and not the time span in the existence of the entity the identifier applies to. Right? If this is so, how could I indicate in another way for which time span in the existence of the entitay the identifier applies? Acka47 (talk) 10:59, 23 January 2019 (UTC)

@Acka47: we are also running in a single value constraint for professions, in all these cases where the GND knows two items, one for men, one for women. e.g. political scientist (Q1238570). --Mfchris84 (talk) 22:21, 18 March 2019 (UTC)
In both cases - Eilendorf (Q1304021) and political scientist (Q1238570) - I would use the qualifier object named as (P1932). Adding the time would be great but it is more complicated and often the GND does not have an exact information on start time (P580) and end time (P582). --Kolja21 (talk) 01:49, 19 March 2019 (UTC)
@Kolja21: Thanks! This makes sense and is really helpful. Acka47 (talk) 10:09, 20 March 2019 (UTC)

Parasynonyme (fr) / Quasisynonym (de)

For subject headings the GND uses quasi-synonym (Q2122467) what is useful for libraries but does not fits to Wikidata. Example:

How should we mark these kind of duplicates?

book publisher (Q1320047)
GND ID (P227): 4063004-3
  1. Qualifier object named as (P1932): Verlag
  2. Qualifier criterion used (P1013): quasi-synonym (Q2122467)
    or mapping relation type (P4390): quasi-synonym (Q2122467)related match (Q39894604)

@Zolo, Jneubert, Emu, Wurgl: Any suggestions? --Kolja21 (talk) 04:10, 19 July 2019 (UTC)

@Kolja21: I wonder if there is any virtue in using these at all. The subject heading (Schlagwort) part of GND always seemed to be pretty broken to me (it actually becomes more and more broken, I am told), there isn’t much to be gained from reproducing this mess in Wikidata … --Emu (talk) 21:52, 21 July 2019 (UTC)
Lieber @Emu: Da stimme ich dir zu, und aus diesem Grund habe ich Sachbegriffe bislang auch weitgehend ignoriert. Aber die GNDs Typ s werden nun mal aus deWP hierher importiert und tauchen in den Dublettenlisten auf, daher sollte man eine einheitliche Lösung (P1013 vs. P4390) finden. BTW: Im Moment sitze ich gerade an der Liste Wikidata:WikiProject Authority control/Tn. Dort sind die Personen aufgeführt, zu denen Tns vorlagen, die per Bot gelöscht wurden. In vielen Fällen gibt es parallel einen gültigen Tp; aber danach hat der Bot nicht gesucht. Fröhliches Schaffen --Kolja21 (talk) 22:11, 21 July 2019 (UTC)
Hi @Kolja21: we've also suffered from this problem in STW Thesaurus for Economics (Q26903352) and other vocabularies linked from it - a nasty example was given here on the pub-thes W3C mailing list. There was no real consensus on how to deal with that situation in the SKOS community. In the end, we defined a custom property zbwext:altLabelNarrower, in order to at least being able to mark such situations (recognition however still is intellectual work). I'd suggest not using mapping relation type (P4390), which was intentionally restricted to the defined set of SKOS relations, but criterion used (P1013), as you suggested. --Jneubert (talk) 15:02, 25 July 2019 (UTC)

✓ OK Thanks for your feedback. I've updated Help:P227. --Kolja21 (talk) 19:19, 25 July 2019 (UTC)


For humans add GND if VIAF and other properties exist

Click on VIAF link in the result list to see if GND exists in VIAF, if yes click the WD link on VIAF page. If User:Bargioni/moreIdentifiers is installed adding "DNB" is done with two clicks.


Progress report 2020-05-23

400000 humans exist that have P7902 and P227, that is 100000 new. Each of the 400000 has a value for P21 (https://w.wiki/Raq), adding it to many long existing items. GND duplicates were down to 10 [7] of which a further two were fixed and one wrong GND assignment was resolved. Several merges added new name forms to longer existing items. I got several notifications due to new links to the new items. I also saw bots and users adding new data to the items.

Now running a second import of about 60000. Each has P21 - as before - and at least a value in in the fields date of birth or date of death in the DtBio website. This adds many living artists, including many from filmportal.de. MrProperLawAndOrder (talk) 16:04, 23 May 2020 (UTC)

Please complete the items before creating more of them. Items with merely a GND identifier are mostly useless and saturate matching for any other application. --- Jura 09:23, 24 May 2020 (UTC)
Also see User_talk:Mike_Peel#Matching_existing_wikidata_objects_with_unconnected_articles (diff). --M2k~dewiki (talk) 10:23, 24 May 2020 (UTC)
As mentioned 19 May "the next things to add is VIAF and time information". GND is CC0 and offers a linked data service, so additional information is ready to be added by bots. I am working on that. MrProperLawAndOrder (talk) 15:09, 24 May 2020 (UTC)

Query (all items above Q95000000 should belong to this import):

SELECT ?person ?gnd
WHERE { 
  ?person wdt:P227 ?gnd . 
  ?person wdt:P7902 ?gnd .
  MINUS { ?person wdt:P569 ?b . }
  MINUS { ?person wdt:P570 ?d . }
}
ORDER BY DESC(?person)
Try it!

--Epìdosis 11:03, 24 May 2020 (UTC)

Can we flag them somehow, similar to ORCID ones. --- Jura 11:50, 24 May 2020 (UTC)
@Epìdosis: How can you know, especially if so many other items are shown in your query, demonstrating that GND-DtBio items without P669 and P570 exist since long ago? MrProperLawAndOrder (talk) 14:15, 24 May 2020 (UTC)

Query limited to instance of (P31)human (Q5):

SELECT ?person ?gnd
WHERE { 
  ?person wdt:P227 ?gnd . 
  ?person wdt:P7902 ?gnd .
  MINUS { ?person wdt:P569 ?b . }
  MINUS { ?person wdt:P570 ?d . }
  ?person wdt:P31 wd:Q5 .
}
ORDER BY DESC(?item)
Try it!

--Epìdosis 15:34, 24 May 2020 (UTC)

Progress report 2020-05-24

475000 humans exist that have P7902 and P227. Each has a value for P21 (https://w.wiki/Raq). Duplicates created due to bug in QS have all been merged, Property_talk:P227/Duplicates#Human at 6. @Jura1, M2k~dewiki, Epìdosis: Adding more information is high on the priority list. As mentioned 19 May "the next things to add is VIAF and time information". GND is CC0 and offers a linked data service, so additional information is ready to be added by bots. As mentioned 23 May the new items proved already useful to others who linked them. I am working with Kolja21 on DtBio, we made huge progress, fixing over hundreds of wrong assignments of ids and creating new items to disambiguate. MrProperLawAndOrder (talk) 15:07, 24 May 2020 (UTC)

  • Please do not create any further items before time information is added. Items that merely have identifiers are useless to Wikidata. We can all find identifiers elsewhere if want them. --- Jura 15:14, 24 May 2020 (UTC)
    @MrProperLawAndOrder: Thank you for the import. In less than one week (hopefully) I can work with @Bargioni: to import from GND ID (P227) the date of birth (P569) and/or the date of death (P570) for all the item not already having them. I agree about waiting for the creation of new items until the existing ones have these data added. --Epìdosis 15:15, 24 May 2020 (UTC)
    @Epìdosis: please explain the benefit of waiting for the creation of new items. Other tools need them as a basis, also note that items without b/d existed before. It is one task to initialize the items and there are several other tasks to enrich the them. Maybe Kolja21 can share more about DtBio humans, but AFAICT they are high value items, high quality - GND is not just any other identifier but controlled by DNB and gives access to CC0 LOD (linked open data). DtBio humans are a subset of GND humans that are found on dozens of third party websites in Germany. And of the DtBio, I did chose yet another subset.
    It could be helpful if the enrichment work would be coordinated and not restricted to DtBio humans but performed for all GND humans. DtBio is just a subset.
    For me personally it would be easier to run each enrichment task only once. If creation of more DtBio humans is postponed, then the tasks have to be run again.
    MrProperLawAndOrder (talk) 15:46, 24 May 2020 (UTC)
    @MrProperLawAndOrder: OK, it seems to make sense. Could you report here when you think to have finished imports for a while, so that we can concentrate on adding dates of birth/death? Also for me and Bargioni "it would be easier to run each enrichment task only once" :) --Epìdosis 15:53, 24 May 2020 (UTC)
    I don't agree to that approach. The items already saturate other tasks and Wikidata capacity is limited. If you continue the import, I will ask for a reblock. --- Jura 15:57, 24 May 2020 (UTC)
    These properties are marked as Wikidata property for an identifier that suggests notability (Q62589316). I have seen a lot of questionable imports in Wikidata - some items are even created without a label - but imho MrProperLawAndOrder does a great work. The new items he added are used for maintenance work, helps to track family relationships, tracking down duplicates and incorrect life data. --Kolja21 (talk) 16:41, 24 May 2020 (UTC)
    @Kolja21, MrProperLawAndOrder: Exact: these items are certainly notable and useful; of course more data are imported, more useful are the items. So, has this subset of DtBio been completely imported? --Epìdosis 16:49, 24 May 2020 (UTC)
    • Let's see how the import of additional information for existing items goes before assessing this iteration of their efforts. --- Jura 16:50, 24 May 2020 (UTC)
The existance of GND-only objects created more manual work, since new articles can not be connected automatically anymore by Pi bot, every item has to be opened manually to check the information behind the GND if it describes the same person with the same year of birth/death due to the lack of this information in the newly created objects (also see User_talk:Mike_Peel#Matching_existing_wikidata_objects_with_unconnected_articles (diff).)
In addition, when creating new GND-only-objects, it seems that already existing objects have not been taken into account and therefore now have to be merged manually, for example:
Also see User_talk:MrProperLawAndOrder#Mathilde_Welcker_(Q94753027)_and_Mathilde_Welcker_(Q94753026)_are_identical (diff) --M2k~dewiki (talk) 16:51, 24 May 2020 (UTC)
From my point of view, also the problem with Quickstatements (?!?), which seem to create two ore more identical GND-only objects in some cases should be analyzed and solved before creating new objects. Merging duplicates afterwards is only a workaround, not an actual solution to the initial problem. It would be better to avoid creating duplicates in the first place and solve the root cause problem before. --M2k~dewiki (talk) 17:01, 24 May 2020 (UTC)
@M2k~dewiki: it is not correct that existing objects were not taken into account. You mention 9 items to be merged "manually", do you know how many I merged manually? Why don't you merge them? "The existance of GND-only objects" - this section is about items having type=human, label en/de/nl, sex, GND ID, DtBio ID, if you refer to them as "GND-only" then you are not correctly portraying them. "... created more manual work," and reduced a lot of other manual work. "since new articles can not be connected automatically anymore by Pi bot" - actually, we talk about items that have a GND, the project hosting the article should ensure that the correct GND is attached to it, and that it is much more safe than working on name and year of birth. Re "Mathilde_Welcker_(Q94753027)_and_Mathilde_Welcker_(Q94753026)_are_identical" this thread was created due to a bug in QS and has been solved, you added unrelated information. Please put any relevant information here.
Re QS bug - I told you I have no control about the QS software and I don't see community consensus to disable the tool, but if you are interested in that, try it. It is not specific to P227/P7902 items. MrProperLawAndOrder (talk) 17:20, 24 May 2020 (UTC)
You can't create duplicates merely because you don't want to match your import against existing items with full dates. Q93871865 had all that, but you created Q95340356. This is different from the QS bug that apparently you are checking. --- Jura 17:29, 24 May 2020 (UTC)
I can as you can and did. I am doing high quality work, checking against "items with full dates" is not a measure to prevent duplicates. And nine duplicates when 175000 new items have been created is much better than your BLKÖ rate I guess. MrProperLawAndOrder (talk) 01:09, 25 May 2020 (UTC)

Regarding Quickstatements also see https://phabricator.wikimedia.org/T234162

--M2k~dewiki (talk) 17:42, 24 May 2020 (UTC)

Progress report 2020-06-01

@Bargioni: is starting now his import of statements from GND. The import will regard, in this first phase consisting of about 616k statements through QuickStatements (plus their references), only the items recently created by @MrProperLawAndOrder: and will add, if available, date of birth (P569), date of death (P570), VIAF ID (P214) and ISNI (P213) referenced from GND ID (P227). A second phase has been already programmed with the import of other statements (occupation (P106), kinships, languages spoken, written or signed (P1412) etc.). --Epìdosis 08:57, 1 June 2020 (UTC)

The import of DtBio helped to find some invalid GNDs, see Wikidata:Database reports/Constraint violations/P227. Reason: In some rare cases DtBio made drag&drop errors like GND 11876980 instead of GND 118769804. Example: Emil Wohlwill (Q95730). --Kolja21 (talk) 12:47, 1 June 2020 (UTC)

Edoderoobot adding false values for P244 [8] MrProperLawAndOrder (talk) 18:42, 1 June 2020 (UTC)

@Epìdosis: no VIAF added [9] [10] is this because of Edoderoobot having inserted a value in P214? Everywhere where someone else added a value recently, the QS batch will not add a value+reference? MrProperLawAndOrder (talk) 20:35, 1 June 2020 (UTC)

@MrProperLawAndOrder: No, such additions by Edoderoobot, correct or incorrect, do not influence my QS batches; in the three cases you mention, no VIAF was added just because GND contained no VIAF; I also guess that the wrong additions by Edoderoobot were due to some error of programming which managed badly the cases where no VIAF existed and, instead of skipping these items, added this incorrect value. --Epìdosis 20:40, 1 June 2020 (UTC)
And Wilhelm Bader Sr. (Q2571811) seems to be a GND duplicate. --Epìdosis 20:46, 1 June 2020 (UTC)
@Epìdosis: Should be solved soon, see de:Wikipedia:GND/Fehlermeldung/Juli 2019. --Kolja21 (talk) 03:09, 11 June 2020 (UTC)

Progress report 2020-06-11

@MrProperLawAndOrder: Phase 1 of the aforementioned import just finished totally successfully, phase 2 just started (previewed duration: circa as phase 1). --Epìdosis 23:36, 10 June 2020 (UTC)

Looking forward for the missing 23% of DtBio-ID / GND. --Kolja21 (talk) 03:11, 11 June 2020 (UTC)
Kolja21, phase 1 was adding VIAF, ISNI, birth day. Phase 2 seems to refer to other kind of enrichment. I just got notification for [11], [12] and many more. MrProperLawAndOrder (talk) 04:18, 11 June 2020 (UTC)
As I've written in the section above this: "A second phase has been already programmed with the import of other statements (occupation (P106), kinships, languages spoken, written or signed (P1412) etc.)". --Epìdosis 09:03, 11 June 2020 (UTC)
Thank both of you for your great work. I've fixed the few remaining duplicates (Property talk:P227/Duplicates#human). --Kolja21 (talk) 21:10, 11 June 2020 (UTC)

Progress report 2020-06-15

@MrProperLawAndOrder, Kolja21: Phase 2 of the aforementioned import just finished totally successfully; I've corrected manually all the errors of the batches. Thanks again to @Bargioni: for having provided me all the batches. Good night, --Epìdosis 21:06, 15 June 2020 (UTC)

@Epìdosis: Impressive job in QS, many many batches! -- Bargioni 🗣 21:15, 15 June 2020 (UTC)

VIAF distinct value violations involving GND humans

Since DtBio is a subset, checking that first could help more. The recently created DtBio humans mostly have no VIAF yet. @Kolja21, Epìdosis: might be interesting for you. MrProperLawAndOrder (talk) 16:13, 27 May 2020 (UTC)

  1. https://w.wiki/RzE example: Peter L. Münch-Heubner (Q64711035) vs Peter Münch (Q64739515). Both are authors, both are born 1960. No chance for VIAF algorithm and a human will get crazy checking these edits. It's hard enough focusing on one authority file but a cluster kills you.
  2. https://w.wiki/RzF see also Property talk:P214/Duplicates --Kolja21 (talk) 15:29, 31 May 2020 (UTC)

VIAF batch merge using QS

The batches were created using SPARQL, see also above. The number following the link is the quantity of items merged. For #3 the query is provided.

  1. https://quickstatements.toolforge.org/#/batch/35884 887 "P7902 merge on VIAF Len yob"
    1. Alan Parker Q271284 != Q95346678 - reported by User:Raymond
    2. Thomas Fuchs Q1247533 != Q95345488 - reported by User:Raymond
  2. https://quickstatements.toolforge.org/#/batch/35935 397 "P7902 merge on VIAF Len yob"
    1. Michael Fischer Q95316658 != Q21588913 - reported by User:Emu
    2. Otto Keller Q2039491 != Q95342132 - reported by User:Emu
  3. https://quickstatements.toolforge.org/#/batch/36771 3206 "P7902 merge on VIAF Len dob https://w.wiki/TLN"
  4. https://quickstatements.toolforge.org/#/batch/36887 848 "P7902 merge on VIAF Len dob https://w.wiki/TLN"
  5. https://quickstatements.toolforge.org/#/batch/37095 602 "P7902 merge on VIAF Len dob https://w.wiki/TLN"

MrProperLawAndOrder (talk) 13:34, 8 June 2020 (UTC) Mr ProperLawAndOrder (talk) 20:53, 9 June 2020 (UTC) Finally https://quickstatements.toolforge.org/#/batch/38022 46 "Merge duplicates based on VIAF and dates" based on https://w.wiki/UiJ --Epìdosis 22:05, 30 June 2020 (UTC) and https://quickstatements.toolforge.org/#/batch/38277 552 "Merge based on GND" based on https://w.wiki/WKf --Epìdosis 15:05, 7 July 2020 (UTC)

Reinheitsgebot adding data from CERL

[13] - makes no sense at all. This is just a copy from GND DB. And Reinheitsgebot is not even doing it directly from CERL but from a MnM catalog. @Epìdosis, Kolja21: it's now some weeks that problems with that bot editing DtBio items have been made public, but no sign it is stopping. MrProperLawAndOrder (talk) 20:24, 1 June 2020 (UTC)

My request about DtBio is still here waiting. However, in my opinion the edit you report in this section regarding CERL is perfectly correct. --Epìdosis 20:32, 1 June 2020 (UTC)
CERL Thesaurus (Q60909659) focuses on the records of Europe's book heritage. It looks as if the project will not be developed further but it is still a reliable source. --Kolja21 (talk) 21:23, 1 June 2020 (UTC)

GND DB data quality - we know it - original research

Because of some values assumed by User:Jura1 and others to be wrong, User:Jura1 wrote "It's not an assumption the value is incorrect, we know it. We don't need low quality fields from databases [...]" [14]

What does the community think

  1. shall any field that ever had one wrong value in any external DB be viewed at as "We don't need"
    1. which other GND DB data fields had wrong values according to the paradigm of "we know it"
  2. shall WD store information found in external sources or shall it store "we know it." and how would that be referenced?

MrProperLawAndOrder (talk) 17:26, 25 May 2020 (UTC)

  • Do you actual disagree with the assessment that the value is incorrect?
The explanation you gave for why the value you upload was that way on GND is "someone with write access to GND added it that way".
I don't think this is a satisfactory explanation for this, nor did you provide any for all other samples listed. --- Jura 17:31, 25 May 2020 (UTC)
You are again offtopic. What individual contributors think about individual values is irrelevant here. The topic is how to use external databases etc. Please show the diff for your statement about me starting "The explanation ...". MrProperLawAndOrder (talk) 17:54, 25 May 2020 (UTC)
  • In Wikidata we do care about truth and don't like to copy mistakes from external databases. When we discover that there was a bad value in GND the default is to deprecate the value on our side. Whenever we do import data it make sense to think about the data quality of our imports.
When it comes to big imports of data the discussion of what should be imported is best done in a bot request. ChristianKl17:05, 26 May 2020 (UTC)
@Christian: We in Wikidata know that. If you want to joint this discussion please explain how you can help. --Kolja21 (talk) 17:54, 26 May 2020 (UTC)

Newly created duplicates BLKÖ via quickstatements without batch number

[15], user:Jura1, could you run your QS command in a way that makes them easier to review? How did you check to not create duplicates? MrProperLawAndOrder (talk) 01:03, 25 May 2020 (UTC)

Also, on the item above you removed "Ritter" from the name given in the BLKÖ article title, but on [16] you keep "Gräfin". What mechanism did you use? Since you asked others to "complete" their items before creating new ones, whilst they have a plan to enrich them, why did you not complete this one and not add 2x VIAF, 2x GND, 1x ISNI and do you even have a plan to enrich your items with authority control numbers? A GND for the Gräfin is stated in the Wikisource article about her since 2012 [17]. MrProperLawAndOrder (talk) 03:20, 25 May 2020 (UTC)

  • As far as P227 is concerned, existing items with GND had been linked. Adding GND or GND-based IDs to newly created items is currently not a priority, but some other things are being done (unrelated to Property talk:P227).
Contrary to the 160000 GND only items, all items already have additional information at Wikimedia. I don't expect @Bargioni: to complete them for me [18]. --- Jura 10:03, 25 May 2020 (UTC)
@Jura1: Anyway, work in progress. Unfortunately we have to access GND a lot of times to grab dates. -- Bargioni 🗣 10:29, 25 May 2020 (UTC)
@Bargioni: what do you mean by work in progress, did you already start? If so, what exactly are you importing from GND. There is much more to obtain than only birth and death information. MrProperLawAndOrder (talk) 12:59, 25 May 2020 (UTC)
@Jura1: can you answer the question regarding your system for keeping Gräfin but deleting Ritter? MrProperLawAndOrder (talk) 12:56, 25 May 2020 (UTC)
@Jura1: if you don't bother importing the high quality identifier GND could you at least add VIAF and sex? You are increasing the number of constraint violations. See Property_talk:P1818#New_items_without_sex,_GND,_VIAF. MrProperLawAndOrder (talk) 13:03, 25 May 2020 (UTC)
I think I answered as far as GND is concerned. --- Jura 13:27, 25 May 2020 (UTC)
@Jura1: GND was not the concern, VIAF and missing sex were. You created several new constrained violations by adding humans without sex. MrProperLawAndOrder (talk) 14:03, 25 May 2020 (UTC)
If it wasn't you adding GND entries, it must be me ;) --- Jura 14:14, 25 May 2020 (UTC)

GND human - GND ID contains -

https://w.wiki/UJU 37 GND humans have GND ID containing "-". @Kolja21, Mautpreller, Emu: sometimes I saw this on items where a dewiki-article existed and the nature of the article changed or it was an article about different things, not only a human. In the long-term the GND entity type should be stored in WD so mismatches can be detected. But "-" is just wrong for humans (piz) as far as I know. MrProperLawAndOrder (talk) 19:32, 16 June 2020 (UTC)

There are two types of cases. No change needed in cases like:
The second type are errors like the gallery "Elisabeth Kaufmann (Zürich)" confused with Elisabeth Kaufmann (Q28970638), a painter of that name. Restoring the property P107 (P107) would help to find these errors. BTW: What is the correct "reason for deprecation" in the case of Elisabeth Kaufmann? I only know applies to other person (Q35773207) but in this case the GND doesn't apply to other person. It applies to an other item. --Kolja21 (talk) 22:03, 16 June 2020 (UTC)

dewiki not detecting duplicate GND

✓ Done see de:Wikipedia:GND/Fehlermeldung/Juni 2020. --Kolja21 (talk) 03:16, 11 June 2020 (UTC)

Enriching GND humans from GND database

Re "Unfortunately we have to access GND a lot of times to grab dates. -- Bargioni 🗣 10:29, 25 May 2020 (UTC)"

@Bargioni, Epìdosis: could you explain

  1. on which GND humans the process is running
  2. from where information is obtained
  3. what information is obtained
  4. what is added

? MrProperLawAndOrder (talk) 13:29, 25 May 2020 (UTC)

  1. We are working on the humans listed in this query:
    SELECT ?person ?gnd
    WHERE { 
      ?person wdt:P227 ?gnd . 
      ?person wdt:P7902 ?gnd .
      MINUS { ?person wdt:P569 ?b . }
      MINUS { ?person wdt:P570 ?d . }
      ?person wdt:P31 wd:Q5 .
    }
    ORDER BY DESC(?person)
    
    Try it!
  2. The information will be obtained from GND (GND ID (P227))
  3. We will obtain date of birth (P569) and/or date of death (P570), maybe also other information (we are reasoning about that)
  4. We will add date of birth (P569) and/or date of death (P570) whenever they have day precision, month precision or year precision; other information (e.g. occupation (P106)) will maybe be added in the next weeks

--Epìdosis 13:47, 25 May 2020 (UTC)

@Epìdosis: RE 1 from which URL do you read? MrProperLawAndOrder (talk) 13:50, 25 May 2020 (UTC)
I guess from the RDF data of each ID (e.g. https://d-nb.info/gnd/105281672X/about/lds for http://d-nb.info/gnd/105281672X), but I'm honestly not sure, because I'm not able to do such imports, while @Bargioni: is :) --Epìdosis 13:56, 25 May 2020 (UTC)
@Epìdosis: can you ask Bargioni? That place also contains VIAF and if available ISNI and relationships to other humans. MrProperLawAndOrder (talk) 14:00, 25 May 2020 (UTC)
@MrProperLawAndOrder, Bargioni: Good idea, we can import add VIAF ID (P214) and ISNI (P213); we will have a look at genealogies. Probably we will start working on GND tomorrow. --Epìdosis 14:06, 25 May 2020 (UTC)
Is there a way to import GND's reference as well? --- Jura 14:12, 25 May 2020 (UTC)
@Jura1: Obviously statements will have references to GND like the ones you can see in Johann Friedrich Wilhelm Dornheim (Q94690240) to FAST or VIAF; @MrProperLawAndOrder: we will import add VIAF and ISNI whenever present. --Epìdosis 14:28, 25 May 2020 (UTC)
@Epìdosis: That's not exactly what I had in mind. GND has (or had) that nice, but somewhat complicated feature, that, as a tertiary reference, it stored the reference for its information (it used to be a code that could be decoded with some other list). --- Jura 14:32, 25 May 2020 (UTC)
@Jura1: OK, now I understand: of course it is good that GND stores references for its statements. However, importing them in our references would probably require creating some new items and possibly other problems. For this reason, we prefer, at least for now, referencing imported information to GND; in the future it will obviously be possible, with more time available (now, as you justly note, it is crucial to add fundamental information such as birth/death dates as soon as possible), extracting also references listed in GND. Thank you very much for the suggestion! --Epìdosis 14:41, 25 May 2020 (UTC)
I think the number of such sources is rather limited (it could be ADB or BLKÖ) and allows to determine the quality of DNB. I agree that the priority should be the dates. References with them would be nice. --- Jura 15:04, 25 May 2020 (UTC)
The list of sources seems quite long indeed. --Epìdosis 15:32, 25 May 2020 (UTC)
@Epìdosis: and nobody has explained here how that could be useful. WD stores references for individual statements. MrProperLawAndOrder (talk) 15:39, 25 May 2020 (UTC)
Statements ideally have references. Wikipedia can be included in the reference section as a source, but it isn't considered a reference. The same goes for any other tertiary source. --- Jura 16:07, 25 May 2020 (UTC)


Reading LDS from DNB website

Storing LDS

@Epìdosis: will the whole LDS file be stored for later extraction of information? MrProperLawAndOrder (talk) 14:23, 25 May 2020 (UTC)

@MrProperLawAndOrder: I'm not sure what you mean, but (if I understand correctly) I think we can do it, if you are interested. --Epìdosis 14:28, 25 May 2020 (UTC)
@Epìdosis: it's the proper way of doing it. It was said above "Unfortunately we have to access GND a lot of times to grab dates.", if one finds an error in the process of writing to WD one can then go back to local dump instead of reading from DNB website again. MrProperLawAndOrder (talk) 14:51, 25 May 2020 (UTC)
OK, I understand. --Epìdosis 14:57, 25 May 2020 (UTC)

Which LDS items to read

  1. 764044 [19] items having GND IDs and sex=0 (unknown),1 (male), 2(female) plus some other info defined in "fl". Adjust rows to 1000000 to get all lines and fl to defgnd to only get the GND. Due to opposition and threats by one user above, they are not all in WD. But maybe you download all of them.
  2. all other GND humans in WD that miss gender, VIAF, ISNI, b or d. MrProperLawAndOrder (talk) 14:51, 25 May 2020 (UTC)

I think you can find all the data in the files authorities-person_lds_20200213.jsonld.gz and/or authorities-person_lds_20200213.rdf.gz and/or authorities-person_lds_20200213.ttl.gz here. --Epìdosis 15:01, 25 May 2020 (UTC)

@Epìdosis: that's outdated. MrProperLawAndOrder (talk) 15:18, 25 May 2020 (UTC)
@Jura1: this section is about which LDS items to read. Sex is a required information for humans. Not reading the LDS for these is no help. "An exception could be made if that information is sourced." - if the field is missing in WD by definition it cannot be sourced. MrProperLawAndOrder (talk) 15:18, 25 May 2020 (UTC)
If GND's reference for the information can be provided that means. It's the same field as discussed above and apparently the way GND determines this can't be explained (see #Wrong_gender_imported_from_GND). --- Jura 15:23, 25 May 2020 (UTC)
@Jura1: What do your statements have to do with the topic of this section which is named "Which LDS items to read"? MrProperLawAndOrder (talk) 15:29, 25 May 2020 (UTC)
Can you explain the difference between the two? --- Jura 15:31, 25 May 2020 (UTC)
@Jura1: This section is not for posting unrelated stuff and then asking others to explain differences between unrelated stuff and the topic. Still waiting for you to explain the relation. MrProperLawAndOrder (talk) 15:35, 25 May 2020 (UTC)
@Jura1, MrProperLawAndOrder: Anyway, this discussion is practically useless: as from this query, no human having GND ID (P227) and Deutsche Biographie (GND) ID (P7902) misses sex or gender (P21), so no sex or gender (P21) will be imported from GND. --Epìdosis 15:37, 25 May 2020 (UTC)
I think they probably already had been dumped into Wikidata. Makes one wonder about the quality of other available fields. --- Jura 15:41, 25 May 2020 (UTC)
@Epìdosis: this section isn't restricted to DtBio items nor is it about which fields to write to WD. See #2 "all other GND humans in WD that miss gender, VIAF, ISNI, b or d. " If one enriches from LDS, one can also do it in the same run for other GND humans. And there are DtBio humans not created yet, which in DtBio website have unknown sex. MrProperLawAndOrder (talk) 15:47, 25 May 2020 (UTC)
OK, so considering also items not having Deutsche Biographie (GND) ID (P7902) I see nearly 15k items needing sex or gender (P21). We will evaluate how the import. --Epìdosis 16:02, 25 May 2020 (UTC)
@Epìdosis: there are more GND-without-DtBio humans that could be enriched. P21 is only one field to write. VIAF, ISNI, d+b are very helpful to find duplicates. Please read LDS for each GND human missing any of the fields VIAF, d+b too. MrProperLawAndOrder (talk) 16:24, 25 May 2020 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Epìdosis, Bargioni: https://w.wiki/TC$ 14201 GND humans missing sex, maybe Bargioni can download all LDS files for these GND IDs, if he hasn't yet. Each DtBio human currently has it, tracking at Wikidata:Database reports/Complex constraint violations/P7902#Deutsche Biographie ID - type human should have sex. MrProperLawAndOrder (talk) 16:25, 6 June 2020 (UTC)

@Bargioni: https://w.wiki/Tvi 135000 GND humans have no VIAF. Do you have all of these downloaded already? GND human should have VIAF, only recently created and some having a process bug are not in VIAF. !! Sometimes a VIAF is in the item, but is deprecated. MrProperLawAndOrder (talk) 00:50, 13 June 2020 (UTC)

Field for occupation

https://d-nb.info/gnd/109640195

gndo:professionOrOccupation <https://d-nb.info/gnd/4226289-6>;

Stated as another GND object, note that GND distinguishes this field by sex, so one for Taxifahrer and one for Taxifahrerin.

Writing LDS data to WD

Manner of writing LDS data to WD

@Epìdosis: - will it be done via QS or a bot? If by bot, will it be one edit adding several things? MrProperLawAndOrder (talk) 15:00, 25 May 2020 (UTC)

@Epìdosis: Through QS, as always: one edit for each statement + one edit for each reference to GND. --Epìdosis 15:01, 25 May 2020 (UTC)
@Epìdosis: Keep in mind that QS can add statements multiple times. Will you use the QS website interface? Via web UI would mean copy paste statements in groups - I had to split my create item lists in several pieces because the web UI didn't accept longer lists. I don't know if one can import longer lists via the command line. MrProperLawAndOrder (talk) 15:13, 25 May 2020 (UTC)
@MrProperLawAndOrder: Bargioni tried importing through command line, but it failed, so we use the website interface; while it is true that batch mode can generate duplicates when creating items, we haven't ever had problem of duplicating statements added to existent items. It is true that QS doesn't accept big batches, so we will split the import. --Epìdosis 15:18, 25 May 2020 (UTC)
@Epìdosis: did it fail because of the size? I had issues with QS adding same statements multiple times not only when creating items but also when adding IDs, they could be found via unique constraints. Anyway, there are probably other tools that later will remove exact duplicates of statements. MrProperLawAndOrder (talk) 15:26, 25 May 2020 (UTC)

Writing via QS web UI: will you use run in background so there is a proper batch id and one can link to sets? MrProperLawAndOrder (talk) 15:26, 25 May 2020 (UTC)

Obviously Bargioni will use batch mode. --Epìdosis 15:30, 25 May 2020 (UTC)

Fields to write to WD

  1. P214 VIAF (priority, to find duplicates in WD since many existing humans have no GND, query in section Property talk:P227#VIAF distinct value violations involving GND humans)
  2. P213 ISNI (similar to VIAF)
  3. time (priority, to solve potential issues reported regarding tool usage)
    • date of birth (priority)
    • date of death (priority)
  4. place
    • place of birth
    • place of death
  5. P21 sex (priority, reduce constraint violations, but GND contains errors: known female had a value for male; probably safe to add "female")
  6. relationships
    • mother
    • father
  7. occupation

MrProperLawAndOrder (talk) 16:37, 25 May 2020 (UTC)

@Epìdosis: what do you think? MrProperLawAndOrder (talk) 16:58, 25 May 2020 (UTC)
It is OK. I hope Bargioni has his Internet connection fixed soon, since now it's unfortunately broken. Bye, --Epìdosis 17:01, 25 May 2020 (UTC)
I'd start with time, then occupation. Agree about "f". I don't think adding all clustered ids is priority. --- Jura 17:06, 25 May 2020 (UTC)


Coordinate writing to WD

@Edoderoo, Epìdosis: Edoderoo also working on it, see Topic:Vn7dpnl9v9dw6fer. How to coordinate to avoid duplicated work? Seems Edoderoo doesn't need QS. No idea how he does it. MrProperLawAndOrder (talk) 10:06, 28 May 2020 (UTC)

I wrote a script in python with Pywikibot. Edoderoo (talk) 12:01, 28 May 2020 (UTC)