Convert page id to string explicitly
Otherwise, when a line such as
{"dump_talk_page_title": "U-597", "talk_page_id": 4294172, "timestamp": "20140129153907", "project": "marca de projeto", "wp10": "1"}
is processed, we get an error like this:
Traceback (most recent call last): File "./utility", line 4, in <module> articlequality.main() File "/home/he7d3r/projects/articlequality/articlequality/articlequality.py", line 54, in main module.main(sys.argv[2:]) File "/home/he7d3r/projects/articlequality/articlequality/utilities/fetch_text.py", line 48, in main run(labelings, output, session, verbose) File "/home/he7d3r/projects/articlequality/articlequality/utilities/fetch_text.py", line 53, in run for labeling in fetch_text(session, labelings, verbose=verbose): File "/home/he7d3r/projects/articlequality/articlequality/utilities/fetch_text.py", line 89, in fetch_text labeling['talk_page_id'] + " " + labeling['timestamp']) TypeError: unsupported operand type(s) for +: 'int' and 'str' Makefile:553: recipe for target 'datasets/ptwiki.labeled_revisions.with_text.9k_2020.json' failed make: *** [datasets/ptwiki.labeled_revisions.with_text.9k_2020.json] Error 1
This is a curious example, where the talk page was created BEFORE the content page:
- 2014-01-29T15:39:07 https://pt.wikipedia.org/w/index.php?title=Discuss%C3%A3o:U-597&oldid=38035204
- 2014-01-29T15:39:08 https://pt.wikipedia.org/w/index.php?title=U-597&oldid=38035205