Spike: Password Reset Project [Timebox: 12 hours]
Closed, ResolvedPublic

Description

As a developer, I want to know the technical, security, and UX considerations for the password reset project, so that the CommTech team can properly prepare for the project.

Requirements:

  • Investigate the technical and UX considerations of requiring both a username and email address to successfully generate a password reset request email
  • Investigate how accounts with 2FA may be impacted by password reset changes and how we can maintain a smooth password reset process for them
  • Investigate if we can have an email only reset option (i.e. username is not an available option). If yes, what would be the consequences (technical and UX)?
  • Investigate the work required to have a default opt-in for new users with an associated email address (while old users remain default opt-out as default)
  • Connect with Security team to determine if there are additional risks to take into account (note: we have had preliminary chats with Sam Reed in Security, but we should reach out again for this spike, if possible)
  • Investigate what sort of logging may be helpful for Community Engagement or Anti-Harassment after this work is complete
  • Query for what percentage of accounts:
    • Don't have any email address associated with an account
    • Don't have a confirmed email address (i.e. they have an email address associated with an account but it has not been confirmed)
    • Have a confirmed email address
    • Have a confirmed email address shared by another account (and, if possible, details related to distribution -- for example, perhaps some emails have 100s of accounts?)

Event Timeline

ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)
ifried renamed this task from Spike: Password Reset Project to Spike: Password Reset Project [12 hours].Aug 15 2019, 5:27 PM
ifried moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.
ifried renamed this task from Spike: Password Reset Project [12 hours] to Spike: Password Reset Project [Timebox: 12 hours].Aug 15 2019, 7:47 PM
ifried updated the task description. (Show Details)
ifried updated the task description. (Show Details)
This comment was removed by MaxSem.

Investigate the technical and UX considerations of requiring both a username and email address to successfully generate a password reset request email

Technically, it's pretty easy, even if we want to make this a per-user option.

Investigate how accounts with 2FA may be impacted by password reset changes and how we can maintain a smooth password reset process for them

They're currently handled like all other users during password reset because 2FA is for logging in, not resetting your password. I don't think we need or want to change that.

Investigate if we can have an email only reset option (i.e. username is not an available option). If yes, what would be the consequences (technical and UX)?

Very easy, all the bits and pieces are already here, we would just need to connect them a bit differently. However, because MediaWiki doesn't require that only one user uses the same email address, several users could match the email. The given address will receive separate emails for every user matching.

Investigate the work required to have a default opt-in for new users with an associated email address (while old users remain default opt-out as default)

This is gonna be ugly no matter how we decide to implement this because MW currently assumes that all users have the same default settings:

  • If we want to change this setting based on account creation date, we would need to wrestle with this assumption and bugs it might potentially cause. Also, we'll need to plug in global account creation date into this because using local dates would result in different wikis behaving differently, which is not good.
  • If we hook into account creation and set a global option for all new accounts, we would create a lot garbage rows in the database for users who would never use their accounts (we've had 4M new global accounts in 2018).

Connect with Security team to determine if there are additional risks to take into account (note: we have had preliminary chats with Sam Reed in Security, but we should reach out again for this spike, if possible)

Emailed them, will post the outcome separately.

Investigate what sort of logging may be helpful for Community Engagement or Anti-Harassment after this work is complete

From a purely technical point, we wouldn't need anything more than standard logging to ELK. If product or AHT need something else, we could always implement more logging or numerical metrics.

Query for what percentage of accounts:

For the reference, the total number of global accounts is:

select count(*) from globaluser;
+----------+
| count(*) |
+----------+
| 59336663 |
+----------+

Don't have any email address associated with an account

select count(*) from globaluser where gu_email is null or gu_email='';
+----------+
| count(*) |
+----------+
| 14396798 |
+----------+

Don't have a confirmed email address (i.e. they have an email address associated with an account but it has not been confirmed)?

select count(*) from globaluser where gu_email is not null and gu_email<>'' and (gu_email_authenticated is null or gu_email_authenticated = '');
+----------+
| count(*) |
+----------+
| 28248914 |
+----------+

Have a confirmed email address?

select count(*) from globaluser where gu_email is not null and gu_email<>'' and gu_email_authenticated is not null and gu_email_authenticated <> '';

+----------+
| count(*) |
+----------+
| 16691054 |
+----------+

Have a confirmed email address shared by another account (and, if possible, details related to distribution -- for example, perhaps some emails have 100s of accounts?).

select matching, count(*) num from (select count(*) matching from globaluser where gu_email is not null and gu_email<>'' and gu_email_authenticated is not null and gu_email_authenticated <> '' group by gu_email having matching > 1) t1 group by matching order by matching desc, num desc;
+----------+--------+
| matching | num    |
+----------+--------+
|      393 |      1 |
|      286 |      1 |
|      216 |      1 |
|      176 |      1 |
|      173 |      1 |
|      170 |      1 |
|      166 |      1 |
|      143 |      2 |
|      140 |      1 |
|      130 |      1 |
|      125 |      1 |
|      121 |      1 |
|      118 |      1 |
|      112 |      2 |
|      107 |      1 |
|      106 |      1 |
|      100 |      1 |
|       99 |      2 |
|       98 |      1 |
|       95 |      1 |
|       94 |      1 |
|       92 |      1 |
|       91 |      1 |
|       90 |      2 |
|       87 |      2 |
|       84 |      2 |
|       83 |      2 |
|       82 |      2 |
|       81 |      1 |
|       80 |      4 |
|       79 |      1 |
|       78 |      1 |
|       77 |      2 |
|       75 |      1 |
|       74 |      3 |
|       73 |      1 |
|       72 |      2 |
|       71 |      4 |
|       70 |      3 |
|       69 |      1 |
|       68 |      3 |
|       67 |      1 |
|       66 |      1 |
|       65 |      4 |
|       63 |      4 |
|       62 |      2 |
|       61 |      3 |
|       60 |      4 |
|       59 |      1 |
|       58 |      5 |
|       57 |      6 |
|       56 |      1 |
|       55 |      5 |
|       54 |      5 |
|       53 |      3 |
|       52 |      1 |
|       51 |      5 |
|       50 |      5 |
|       49 |      2 |
|       48 |      7 |
|       47 |      8 |
|       46 |      5 |
|       45 |      9 |
|       44 |     11 |
|       43 |     10 |
|       42 |      4 |
|       41 |      9 |
|       40 |      4 |
|       39 |      8 |
|       38 |      8 |
|       37 |      9 |
|       36 |     16 |
|       35 |      9 |
|       34 |     13 |
|       33 |     14 |
|       32 |     19 |
|       31 |     17 |
|       30 |     17 |
|       29 |     23 |
|       28 |     28 |
|       27 |     37 |
|       26 |     35 |
|       25 |     34 |
|       24 |     41 |
|       23 |     46 |
|       22 |     60 |
|       21 |     63 |
|       20 |     71 |
|       19 |     80 |
|       18 |    102 |
|       17 |    101 |
|       16 |    141 |
|       15 |    126 |
|       14 |    196 |
|       13 |    246 |
|       12 |    292 |
|       11 |    367 |
|       10 |    485 |
|        9 |    712 |
|        8 |   1018 |
|        7 |   1445 |
|        6 |   2530 |
|        5 |   5069 |
|        4 |  15037 |
|        3 |  81041 |
|        2 | 771294 |
+----------+--------+

I suspect that most high count emails belong to sockfarms.

Ooh, thanks for this @MaxSem 🤩So the proportion of no(~¼), unconfirmed(~½), and confirmed(~¼) emails looks something like this:

Screenshot 2019-08-26 at 5.47.49 PM.png (732×1 px, 59 KB)

Note that the total of other three queries was off by about a hundred from the total number of accounts, pretty minor given the totals, but I thought I'd point it out.


I am not really sure how to read the last table. My current understanding is—the first column is how many accounts share that email address, and the second column is the number of instances that has happened. So one email being shared by 393 accounts has happened only once (so 393 user names?), but an email being shared by 43 accounts has happened 10 times (essentially 430 user names?). Is this correct?

Is this correct?

Yes.

Cool! Another question, do these numbers include only confirmed emails or all? I think for the password reset case having numbers for just the confirmed ones would be helpful.

Investigate the work required to have a default opt-in for new users with an associated email address (while old users remain default opt-out as default)

This is gonna be ugly no matter how we decide to implement this because MW currently assumes that all users have the same default settings:

  • If we want to change this setting based on account creation date, we would need to wrestle with this assumption and bugs it might potentially cause. Also, we'll need to plug in global account creation date into this because using local dates would result in different wikis behaving differently, which is not good.
  • If we hook into account creation and set a global option for all new accounts, we would create a lot garbage rows in the database for users who would never use their accounts (we've had 4M new global accounts in 2018).

Is it something that can be default-on when someone confirms their email address? So it shouldn't matter how old the account is, just when they enter an email this setting (1) shows up, and (2) is on by default.

Is it something that can be default-on when someone confirms their email address? So it shouldn't matter how old the account is, just when they enter an email this setting (1) shows up, and (2) is on by default.

The whole point of the original question was that not every user wants to always type username and email.

Investigate if we can have an email only reset option (i.e. username is not an available option). If yes, what would be the consequences (technical and UX)?

Very easy, [..]. MediaWiki doesn't require that only one user uses the same email address, several users could match the email. The given address will receive separate emails for every user matching.

If the user using the system is the owner of the e-mail address, I think this is acceptable UX.

However, from a security and privacy viewpoint, this might pose a problem. In particular, we need to be careful not to specify whether it matched any accounts at all. We don't want to reveal whether the owner has a Wikipedia account and/or with that address (which would be an information leak).

Right now, however, it seems we already support e-mail only resets. From https://en.wikipedia.org/wiki/Special:PasswordReset, enter only a non-existent e-mail, and submitting, results in "If this email address is associated with your account, then a password reset email will be sent.". Trying it with a Gmail address of mine, I got two e-mails (as I have two accounts with that). So that part work already as well.

A couple of things I've been wondering about:

  • It's possible to register a new account with an email address that's already in use (and confirmed) for other accounts. This doesn't matter for this project does it? Because we're assuming the harasser doesn't know the email address.
  • A password can be reset with an unconfirmed email address. This is also not a worry because we'll not allow the email-required preference unless the address is confirmed.

Just a note: Max and I were talking about this a few days ago and how it might impact the API. It appears all of the code from the API and the UI ends up in the same class/function so the changes here shouldn't be difficult to apply to both scenarios.