Description

This task is for tracking how we test/QA the new EditorJourney schema, as well as our ability to use its data along with the data from other relevant schemas. If we find bugs, we should file those separately. On this task, we'll just discuss issues related to our ability to test.

Yesterday, I created a new account in Test Wiki (MMiller Test 01 test).

I did about 25 things, and recorded all the things I clicked on with timestamps.

Then this morning, I looked in Hadoop for the events, and compared to see if everything ended up in the database.

There were some discrepancies that we should go over together in my spreadsheet. Here are the issues at a high level:

We are treating the Main Page as a normal article page and obfuscating its information. That might cause some unneeded challenges in our analyses later, when we want to see if people visited the homepage. Since it's not a sensitive page, we might want to special-case it so it doesn't get hashed.

There are a couple times when an event seems to be recorded twice.

We think redirects are being treated in a somewhat weird way. Some of the fields record the identifiers of the link that was clicked on, and some record the identifiers of the destination. We should discuss.

Something weird happened when I clicked on "Recent Changes".

No event is recorded for logging out, but we would like to know when users log out.

@kostajh will address some of the edge cases in which Special pages, such as Special:RecentChanges get logged with the wrong information. The proposed solution makes the namespace -1 and the page_id 0.

@Etonkovidova will check how logging goes when a user tries to edit a semi-protected or protected page.

@nettrom_WMF and @kostajh will look into why a certain event got recorded twice, with exactly the same content, but three seconds apart. The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

@nettrom_WMF and @kostajh will look into why a certain event got recorded twice, with exactly the same content, but three seconds apart. The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

I looked at this by pulling up https://www.mediawiki.org/w/index.php?title=User:KHarlan_(WMF)/Blah and looking at network requests after clicking "Edit". I end up with two network requests a few seconds apart with the request URL of https://www.mediawiki.org/w/index.php?title=User:KHarlan_(WMF)/Blah&action=edit. The first one is MediaWiki returning the edit page, the second one is generated by VisualEditor which makes another request to https://www.mediawiki.org/w/index.php?title=User:KHarlan_(WMF)/Blah&action=edit during its load process. I don't think there's anything we can do about this other than to account for it in analysis.

@nettrom_WMF and @kostajh will look into why a certain event got recorded twice, with exactly the same content, but three seconds apart. The event in question was when I clicked to edit/create my User Talk page as User:MMiller Test 01 test.

I checked the EditAttemptStep schema regarding this, and it has three events stored around the same timestamps for each of the three stages of the editor's loading: init, loaded, and ready. These are all part of the same single edit. As @kostajh points out, the VisualEditor makes two requests, so we'll see those in EditorJourney. I'm sure I can figure out some way to combine the two data sources to account for this in our analysis.