Incident Reporting System/Minimum Viable Product User Testing Summary

Translate this page

The Incident Reporting System (IRS) aims to make it easy for users to report harmful incidents safely. It is a requirement of the new Universal Code of Conduct to have a reporting system in place as well as a recommendation of the Movement Strategy.

Earlier in 2023 we conducted research in Indonesian and Korean Wikipedias with the goal of understanding harassment, how harassment is reported and how responders to reports go about their work. In November 2023, editors were invited to test a Minimum Testable Product (MTP) for the Incident Reporting System.

We now have a first iteration of the IRS (see MVP prototype screens below) and we need to understand whether we are on the right path.

Thus in March 2024, the Trust & Safety Product team conducted user testing of the Minimum Viable Product (MVP) of the Incident Reporting System.

Executive summary[edit]

“There will always be people on the Internet who like to cause confusion. But if there is some way to report it, I believe it is a good thing.”

Reporting entry point

During user testing, all participants found the entry point to report an incident and the current user flow is well understood.

Types of reports

There was some confusion over two of the reporting options: “someone might cause self-harm” and “public harm threatening message”.

Report processing and response

Two participants also made assumptions about the system being automated. One participant was concerned about automation and wanted a human response, whereas the other participant felt assured by the idea it would check if the abuser had any past history of threats and offences, and delete the offensive comment accordingly. All participants expected a timely response (an average of 2-3 days) after submitting a report.

Research goals[edit]

We wanted learn the following:

Do users know where to go to report an emergency incident?
Does the user flow make sense and feel intuitive?
Are users able to complete their task without needing additional help?
What are users' expectations of submitting a report?

Participants and test format[edit]

An unmoderated task-based test using a prototype and card sort, was conducted on Userlytics with editors from Portuguese and Italian wikis.

Participants[edit]

11 users were recruited
All participants were active editors (< 10 edits within last 3 months)
All participants have interacted with other users on wikis
7 participants were from Portuguese wikis
4 participants were from Italian wikis
2 participants were discounted due to technical difficulties

Task and questions[edit]


Prototype screen	Questions
Screenshot of user talk page displaying an example abusive comment.	1. Finding an abusive comment on your or someone else’s talkpage: What do you do in this situation? Who would you reach out to for help in this circumstance? If you see a hateful comment like in this example, what would you do?
User talk page displaying an example abusive comment and reporting menu	2. Getting help What options do you see on this screen? Which of the options do you think will get you help? What do you think will happen when you select that option?
Screenshot of incident reporting form with radio buttons	3. Report an incident Can you tell us what each of the options mean? Which option will you choose in this case?
Screenshot of emergency report form	4. What will you report? What information will you generally include to report such a hate message? Who do you think this report will go to? What will you do next?
Screenshot of report submission success message	5. What happens now? What do you think happened? What do you think will happen next? How long do you think it will take? How do you feel about it?

Card sort[edit]

We wanted to understand what information and guidance communities need for reporting non-life threatening user behaviour. Participants were presented with a set of cards under the category "User behaviour reporting". Participants were then asked to rank them by importance (1 being most important).


Ranking	Information needed for reporting user behaviour
1. Most important	Instruction on what you should do in this case
2.	Name and contact of the people you can reach out to
3.	Link to pages where you can post about an abuse on your talk page
4.	Guidelines on how to have a discussion with the abuser
5.Least important	Other self help options

Key findings[edit]

Summary[edit]

All participants found the entry point for reporting an incident.
Nearly all participants understood which category to report the incident under.
Some participants were confused by the categories “someone might cause self-harm” and “public harm threatening message”.
Some participants said they wouldn’t report minor user behaviour issues.
For most participants, the ideal time period to receive a response is 2-3 days.The maximum time expected is 1 week and the minimum is within a few hours.
Most of the participants thought the following would be most important on a community guidance page, when reporting non-life threatening user behaviour:
1. Instruction on what you should do in this case
2. Name and contact of the people you can reach out to
Automation was a theme among two participants. While one participant was concerned about their report not being read by a human, another participant thought an automated system would check if the abuser had any past history of threats and offences, and delete the offensive comment accordingly.

Recommendations[edit]

“I'm a little afraid that it will be sent to an automated system and that it won't actually be read by human beings.”

Now[edit]

Keep the user flow as it is, since all participants understood where to report an incident.
Provide more clarity about the categories “someone might cause self-harm” and “public harm threatening message”: either by changing the wording or considering other ways to categorise the types of abuse users want to report.
Set expectations that a human will be responding, not an automated system, to alleviate user concerns.

Future ideas[edit]

Build an intelligent system to reduce the response/action time based on a report.
Utilise a ‘user reputation score’ to decide the likelihood of a report being true or false.
Use natural language queries from the reporter to identify the best course of action (e.g. “Someone called me a pig on my talk page”, “I see someone outside my door with a baseball bat”...)
Allow additional evidence to be added such as images, voice notes etc.
A/B test removing an abusive message after it has been reported vs keeping it on the talk page with assurance that it has been reported.