CU 2.0: Create queries required for the Timeline tab [medium]
Closed, ResolvedPublic2 Estimated Story PointsMar 25 2020
Actions

Assigned To

Authored By

	Mooeypoo
	Mar 13 2020, 9:34 PM

Description

As a step towards implementing the mockup detailed at the parent task T237595, this task concentrates on developing the required queries to fetch the timeline data.

Acceptance criteria:

Create the query needed to fetch a list of contributions of the given users and IPs under investigation, ordered by time
Output the results in a standardized way for the Frontend to pull this data for display (display itself will be a followup task)
The output should note each entry's type (log, edit, page creation) in a sensible way for (later) the frontend to be able to adjust its formatting.
Create integration test for the queries to make sure the data is outputted correctly

Details

Due Date: Mar 25 2020, 4:00 AM

	Subject	Repo	Branch	Lines +/-
	Add TimelineService queries	mediawiki/extensions/CheckUser	master	+170 -0

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		None	T139810 RFC: Overhaul the CheckUser extension
Resolved		Niharika	T236225 [Epic] CheckUser 2.0 Improvements
Resolved		Niharika	T237595 CU 2.0: Timeline tab
Resolved	Jun 2 2020	dmaza	T247641 CU 2.0: Add variant display to the Timeline result list
Resolved	Jun 2 2020	dmaza	T247640 CU 2.0: Add the timeline tab with a list of generic structured results [medium]
Resolved	Mar 25 2020	dmaza	T247639 CU 2.0: Create queries required for the Timeline tab [medium]
Resolved	May 12 2020	Tchanders	T252030 Display comments after checking that the user has the right permissions if the revision was suppressed
Resolved	Jun 2 2020	Tchanders	T252031 Make TimelineRowFormatter::getLinks do the same checks as it is currently done in SpecialCheckUser
Resolved	Jun 2 2020	dbarratt	T252485 Use TitleFormatter and SpecialPageFactory
Declined	May 12 2020	dmaza	T252033 Have user links follow the same behavior as in SpecialCheckUser
Resolved		dmaza	T252089 Create a factory for the row formatter

Event Timeline

Mooeypoo created this task.Mar 13 2020, 9:34 PM

Mooeypoo mentioned this in T247640: CU 2.0: Add the timeline tab with a list of generic structured results [medium].Mar 13 2020, 9:37 PM

Mooeypoo updated the task description. (Show Details)Mar 13 2020, 9:41 PM

Niharika triaged this task as Medium priority.Mar 13 2020, 10:21 PM

Niharika moved this task from Untriaged to Triage/To be Estimated on the Anti-Harassment board.

Niharika updated the task description. (Show Details)

ARamirez_WMF renamed this task from CU 2.0: Create queries required for the Timeline tab to CU 2.0: Create queries required for the Timeline tab [medium].Mar 18 2020, 4:39 PM

ARamirez_WMF set the point value for this task to 2.

ARamirez_WMF moved this task from Triage/To be Estimated to Cards ready for development on the Anti-Harassment board.

Tchanders added a parent task: T247640: CU 2.0: Add the timeline tab with a list of generic structured results [medium].Mar 18 2020, 4:42 PM

ARamirez_WMF moved this task from Cards ready for development to The Letter Song on the Anti-Harassment board.Mar 18 2020, 5:15 PM

ARamirez_WMF edited projects, added Anti-Harassment (The Letter Song); removed Anti-Harassment.

ARamirez_WMF set Due Date to Mar 25 2020, 4:00 AM.

ARamirez_WMF changed the subtype of this task from "Task" to "Deadline".

dmaza claimed this task.Mar 19 2020, 2:12 PM

dmaza moved this task from Ready 🎬 (ONLY IF YOU HAVE NO MORE CODE TO REVIEW) to In Progress 💪 on the Anti-Harassment (The Letter Song) board.

Looking at the current "edits" option on CheckUser, there is no pagination and a limit of 5000 rows.

Is this what we want here? No pagination and a sensible limit?
The old implementation only allows 1 user at a time; should the 5000 limit be split by the number of users in the current investigation or do we want to allow 5000 results per user?

From the acceptance criteria, I'm unsure what the following means. Is it to display the raw data without format/links/date grouping/etc the intention?

Output the results in a standardized way for the Frontend to pull this data for display (display itself will be a followup task)

Is this what we want here? No pagination and a sensible limit?

When we discussed T247640 in estimation we planned to make a pager, though it looks like this hasn't been documented in the task yet.

I would avoid doing things that have no (or huge) limit. Honestly, I think it's a product problem more than a performance one, but since we are checking more than one target (unlike the original) I'd be slightly more careful, and go with pagination.

We did talk about it during estimation, I think it was talked about as if it's obvious and maybe it isn't :) we should add it in. I don't want to change the acceptance criteria while you're working on it, @dmaza in case it doesn't make sense with the way you're implementing things -- does it sound okay to add a pager to this task? or will it change the entire estimation or make it too large/different?

@Mooeypoo @dmaza Adding the pager to this task would make this task a lot larger than the others, and would be a departure from what we did for the Preliminary Check and Compare tabs.

When we discussed breaking up the work in estimation we broadly planned to do as follows:

T247639 (this task) - service
T247640 - pager
T247641 - refinement of results display

This breaks up the first two tasks nice and evenly. As we did previously, we can write TimelineService::getQueryInfo etc knowing that they will be called by a pager.

In T247639#5988424, @Tchanders wrote:

Is this what we want here? No pagination and a sensible limit?

When we discussed T247640 in estimation we planned to make a pager, though it looks like this hasn't been documented in the task yet.

A pager makes sense, but do we want to limit the number of records to fetch for each user? To quote the inline comment on SpecialCheckUser:747

// Ordered in descent by timestamp. Can cause large filesorts on range scans.
// Check how many rows will need sorting ahead of time to see if this is too big.
// Also, if we only show 5000, too many will be ignored as well.

In T247639#5989794, @Mooeypoo wrote:

@dmaza in case it doesn't make sense with the way you're implementing things -- does it sound okay to add a pager to this task? or will it change the entire estimation or make it too large/different?

Not really. I'm adding a pager

Change 583511 had a related patch set uploaded (by Dmaza; owner: Dmaza):
[mediawiki/extensions/CheckUser@master] Add TimelineService queries

https://gerrit.wikimedia.org/r/583511

gerritbot added a project: Patch-For-Review.Mar 26 2020, 4:28 AM

So to summarize what has been discussed. This task is only to generate the query that will be consumed by the pager. It does not involved any data massaging or any extra handled on the pager. That work will be done as part of T247640: CU 2.0: Add the timeline tab with a list of generic structured results [medium]
With that said, is it fair to say that the last 3 bullet points on the Acceptance Criteria can be removed since they'll be handled in T247640?

@dmaza I think we can keep the last three acceptance criteria, but change the last one slightly. The way I interpret the other two, the patch already does them:

Output the results in a standardized way for the Frontend to pull this data for display (display itself will be a followup task)

I think this one is saying return the results in a way that the pager can interpret.

The output should note each entry's type (log, edit, page creation) in a sensible way for (later) the frontend to be able to adjust its formatting.

I think this one is similar to the above - making sure the type is passed onto the pager.

Create integration test for the queries to make sure the data is outputted correctly

I agree integration tests and output formatting tests need to wait for T247640. Could we change this here to:

Create unit test for the queries to make sure the query is built correctly

(As the patch already does.)

Tchanders moved this task from In Progress 💪 to Code Review 🔍 on the Anti-Harassment (The Letter Song) board.Mar 26 2020, 8:42 PM

Tested the query that I'd expect the pager to build based on this patch on testwiki on the 5 most prolific users and 5 most prolific IPs - it runs instantaneously on a dataset that size.

I think we should do further tests as part of T248588 (which should be a release blocker) before enabling on larger wikis. We may need to do something along the lines of T245499, as pointed out by @dmaza.

Change 583511 merged by jenkins-bot:
[mediawiki/extensions/CheckUser@master] Add TimelineService queries

https://gerrit.wikimedia.org/r/583511

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.26; 2020-03-31).Mar 27 2020, 7:01 PM

Maintenance_bot removed a project: Patch-For-Review.Mar 27 2020, 7:11 PM

Tchanders moved this task from Code Review 🔍 to QA/Testing 🐞 on the Anti-Harassment (The Letter Song) board.Mar 27 2020, 9:04 PM

Nothing to test on the frontend. For the backend work, I can look at the output of TimelineService::getQueryInfo(), and it looks fine.

E.g. output for an IP, IP range and username:

TimelineService::getQueryInfo( ["1.2.3.4", "1.2.3.4/16", "User1"] );                                                                                                                                                 
=> [
     "tables" => "cu_changes",
     "fields" => [
       "cuc_namespace",
       "cuc_title",
       "cuc_user",
       "cuc_user_text",
       "cuc_comment",
       "cuc_actiontext",
       "cuc_timestamp",
       "cuc_minor",
       "cuc_page_id",
       "cuc_type",
       "cuc_this_oldid",
       "cuc_last_oldid",
       "cuc_ip",
       "cuc_xff",
       "cuc_agent",
     ],
     "conds" => "cuc_ip_hex = '01020304' OR ((cuc_ip_hex >= '01020000') AND (cuc_ip_hex <='0102FFFF')) OR cuc_user = 5",
   ]

Acceptance criteria:

Create the query needed to fetch a list of contributions of the given users and IPs under investigation, ordered by time

The above query doesn't order by time, but I guess this will be done by whatever is going to call getQueryInfo().

Output the results in a standardized way for the Frontend to pull this data for display (display itself will be a followup task)

The output should note each entry's type (log, edit, page creation) in a sensible way for (later) the frontend to be able to adjust its formatting.

Above query returns things like cuc_type, cuc_minor, cuc_actiontext from which this can be derived.

Create integration test for the queries to make sure the data is outputted correctly

This new class has unit tests. Not sure about integration tests. Thalia notes this above.

dbarratt closed this task as Resolved.Apr 7 2020, 1:51 PM

CU 2.0: Create queries required for the Timeline tab [medium]Closed, ResolvedPublic2 Estimated Story PointsMar 25 2020Actions