Performance review of IP Info extension
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Niharika
	Aug 19 2020, 4:49 PM

Description

Basic description

A new extension for providing information about an IP address without (1) the need for the user to use an external service themselves and (2) exposing the IP address itself to the user. (Actually hiding the IP addresses is beyond the scope of this project.)

This provides the user with information they could have retrieved from knowing the IP address. This information could be displayed in various ways (popup on hover/click, special page, etc.).

Data

Based on our investigation in T259726, the data our users are looking for is not accessible from freely licensed datasets alone. Ideally IPInfo would combine several datasets, but this depends on agreeing licences with different providers, and will only be considered in future iterations.

The first iteration of IPInfo will use only MaxMind's GeoIP2 databases, which we already have licences for, and which are already available on our servers: T263263#6534392. A PHP package providing an API for these databases is undergoing a security readiness review: T262963.

For third parties who do not have access to the proprietary datasets, IPInfo will use MaxMind's free GeoLite2 databases.

API

IPInfo provides two API endpoints, taking an edit id or a log id. If the edit or log was performed by an anonymous user (or the log target is an anonymous user), the API returns data about the relevant IP address(es).

Client-side UI

Currently IPInfo adds a button next to IP addresses on Special:Log and history pages. The data are retrieved on clicking this button, and displayed in a popup. This design may change.

Deployment

We expect the feature to be deployed to all wikis, and be available on certain pages that show IP addresses. It will initially be available only to checkusers.

Preventing abuse

Sending an edit/log ID rather than the IP address will allow the IP address to remain hidden (once they become hidden in the future), and is intended to prevent IPInfo from being used as an API for getting information about arbitrary IP addresses.

Only users with the 'ipinfo' right can see the information. At first this right will only be given to checkusers. In the future when IP addresses are no longer visible, more users may need to access this information, e.g. for patrolling to fight vandalism. This will depend on user research and testing.

Users will sign an agreement when first using this tool, and a record will be kept of which users have access. Access may be taken away from a user, and a record will be kept of this too: T264150.

Preview environment

(Insert one or more links to where the feature can be tested, e.g. on Beta Cluster.)

Hosting the changes on Beta Cluster is a requirement prior to performance review. Please ensure that the feature can be used directly on the link(s) provided, without any data entry such as having to create an article. Any sample content needed should already be present.

If the changes cannot be hosted on Beta Cluster, explain why and provide links to an alternate public environment instead where the Performance Team can also SSH into. Links to code only is insufficient for a performance review.

IPInfo is available to logged-in users on our test environment: https://thegoodplace.wmcloud.org/index.php?title=Special:Log

Clicking buttons next to the IP addresses will result in a popup displaying either data or an error message.

Which code to review

(Provide links to all proposed changes and/or repositories. It should also describe changes which have not yet been merged or deployed but are planned prior to deployment. E.g. production Puppet, wmf config, or in-flight features expected to complete prior to launch date, etc.).

Gerrit: https://gerrit.wikimedia.org/r/admin/repos/mediawiki%2Fextensions%2FIPInfo|mediawiki/extensions/IPInfo
Github: https://github.com/wikimedia/mediawiki-extensions-IPInfo

Performance assessment

Please initiate the performance assessment by answering the below:

What work has been done to ensure the best possible performance of the feature?

Data for each IP address is requested on demand, rather than holding up page load.

Data is only requested once for each log entry or edit on the page.

What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?

We may be limited by the performance of GeoIP2 library.

In future versions, if more datasets and services are used, a likely bottleneck will be the performance and availability of the 3rd party webservice.

Are there potential optimisations that haven't been performed yet?

Depending on the license and the product requirements, we may be able to cache revisions/logs made by anon users and make requests to our API without credentials (which will give all users the information from the varnish cache).

A new popup widget is appended for each button. The data for each log entry or edit is cached in the widget, but data for the same IP address may be requested more than once if it occurs in a different log entry or edit. We could re-use the same popup widget, and we could store a map of the data for each IP address rather than perform several requests for the same IP address (but different log/edit IDs). The design is subject to change, and we do not yet know how the tool will be used.

Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.

We plan on implementing logging via an EventLogging schema, including how long the requests take and how often the tool is used.

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		STran	T298955 Return different data from IPInfo api based on context [M]
Resolved		STran	T300825 IPInfo logger should queue a logging job instead of writing directly
Open		None	T261555 Allow CheckUser to provide IP addresses for logged in edits/actions
Open		None	T285977 IP Info
Resolved		Niharika	T287648 Reviews and deployment
Resolved		Tchanders	T260597 Deploy IP Info extension to all wikis (as a beta feature)
Resolved		Niharika	T261372 Replace mock data returned from API endpoint with live data
Resolved		• AGueyte	T306212 Remove AdHocDebug logging for investigating duplicate log entries
Resolved		STran	T296480 Enable the ipinfo instrument in production
Resolved		STran	T304438 Enable IP Info instrumentation in testwiki
Resolved		• AGueyte	T307493 Update IPInfo disclaimer per Legal
Resolved		Niharika	T292755 Epic: IP Info access
Resolved		STran	T292842 Log when the user's access level changes
Resolved		Niharika	T263756 Create a table to store which users have access to IPInfo, and the timestamp when access was granted [L]
Resolved		Tchanders	T264795 Decide how we want to store how we log which users have access to IP Info [8 hours]
Invalid		None	T291827 Create IPInfo access manager
Invalid		None	T291854 Create revoke user access maintenance script
Invalid		None	T296186 Create a group that revokes all ipinfo* rights
Resolved		STran	T264150 User needs to request access to IP information [L]
Resolved		STran	T291582 Implement condition agreements in Special:Preferences [M]
Resolved	Spike	phuedx	T291583 [SPIKE] Check that IPInfo-related user properties aren't replicated to publicly accessible databases [8H]
Resolved		STran	T294658 Create a log entry when the RevisionHandler is called
Resolved		• TThoabala	T294657 Create a log entry when the LogHandler is called
Resolved		phuedx	T294680 IP Info: Create a logger function to be used in both the popup and accordion presenters [M]
Declined		Tchanders	T296184 Automatically promote users to the group that grants basic rights
Resolved		STran	T296085 Create a group that grants basic ipinfo* rights
Resolved		STran	T296499 Grant certain groups the ipinfo-view-full right
Resolved		Tchanders	T260598 Deploy IP Info extension to test.wikipedia.org
Resolved		Tchanders	T260599 Deploy IP Info extension to beta cluster
Resolved		sbassett	T262963 Security Readiness Review For geoip2/geoip2
Resolved	Dec 15 2020	Tchanders	T266136 Update CI config once geoip2 is available
Resolved		Tchanders	T269475 Add geoip2/geoip2 library to mediawiki/vendor
Resolved		Tchanders	T260602 Complete the necessary reviews for the IP Info extension
Invalid		None	T260601 Remove the static mock data from the popup and make a request to the API endpoint for dynamic mock data
Resolved	Sep 22 2020	Tchanders	T260603 Create an API endpoint that accepts a log id or revision id and returns mock data about the IP address
Resolved	Sep 22 2020	Tchanders	T260604 Display a popup with fixed mock data for an IP address on the specified page(s)
Resolved		Tchanders	T260605 Add IP Info extension to translatewiki.net
Resolved		dbarratt	T260607 Deploy the IP Info extension to The Good Place
Resolved		Tchanders	T260609 Create a basic extension.json file and a new wikipage for the IP Info extension
Resolved		dbarratt	T260611 Request a repository for the IP Info Extension
Resolved		Tchanders	T260610 Request the necessary reviews of the IP Info extension
Resolved		sbassett	T260822 Security review of IP Info extension
Resolved		Niharika	T260821 Performance review of IP Info extension
Invalid		None	T270347 Remove 'ipinfo' right from administrators on the Beta Cluster
Stalled		None	T309318 Can global sysops have ipinfo-view-full and ipinfo-view-log access?
Resolved	Security	Urbanecm	T309411 sysop access to ipinfo logs can leak IP addresses that sysops should not have access to
Resolved		Urbanecm	T309928 Remove `ipinfo-view-log` from sysops

Event Timeline

Niharika created this task.Aug 19 2020, 4:49 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 19 2020, 4:49 PM

Niharika triaged this task as Medium priority.Aug 19 2020, 4:50 PM

Niharika added a project: IP Info.

Niharika renamed this task from Performance review of IP Info to Performance review of IP Info [WIP].Aug 19 2020, 4:54 PM

Niharika renamed this task from Performance review of IP Info [WIP] to Performance review of IP Info [WIP] [4H].Aug 19 2020, 5:07 PM

Niharika moved this task from Triage/To be Estimated to Cards ready for development on the Anti-Harassment board.

Niharika changed the edit policy from "Custom Policy" to "All Users".Aug 19 2020, 5:09 PM

Niharika added a parent task: T260610: Request the necessary reviews of the IP Info extension.

Tchanders mentioned this in T259725: Technical investigation for learning from extensions that use external APIs [8H].Aug 20 2020, 3:07 PM

ARamirez_WMF moved this task from Cards ready for development to The Letter Song on the Anti-Harassment board.Aug 20 2020, 4:49 PM

ARamirez_WMF edited projects, added Anti-Harassment (The Letter Song); removed Anti-Harassment.

dbarratt claimed this task.Aug 21 2020, 6:00 PM

dbarratt moved this task from Ready 🎬 (ONLY IF YOU HAVE NO MORE CODE TO REVIEW) to In Progress 💪 on the Anti-Harassment (The Letter Song) board.

dbarratt updated the task description. (Show Details)Aug 25 2020, 9:07 PM

dbarratt updated the task description. (Show Details)

dbarratt updated the task description. (Show Details)Aug 25 2020, 9:09 PM

Tchanders mentioned this in T260822: Security review of IP Info extension.Aug 28 2020, 2:59 PM

dbarratt updated the task description. (Show Details)Aug 28 2020, 9:09 PM

dbarratt updated the task description. (Show Details)Aug 29 2020, 4:07 PM

dbarratt moved this task from In Progress 💪 to Code Review 🔍 on the Anti-Harassment (The Letter Song) board.

dbarratt updated the task description. (Show Details)

Tchanders updated the task description. (Show Details)Aug 31 2020, 11:45 AM

Thanks for laying out this information @dbarratt.

We should bear in mind that this review will happen next quarter, once we've implemented something preliminary. We should also bear in mind that the approach we take will be constrained by the licenses; if they don't allow us to store data, we'll have to use option 3 - which is why we're taking that approach to begin with.

(Note that, although we're storing the MaxMind data, @aezell found that our license won't allow us to give our users information about other IP addresses from that dataset. Currently we only send details about their own IP address.)

With that in mind I'd suggest a few changes:

There are at least three ways to accomplish this: [...]

We should make it clear that this might be out of our hands, and highlight that we're particularly interested in discussing option 3, since this seems the most risky performance-wise, but also the most likely license-wise.

Are there any objects to any of the methods above?
If so, are there ways we can mitigate these objections?
Can you think of any other ways we might be able to solve this problem?
Which solution would your team prefer?

These questions might be a bit too broad by the time we have the review... We'll most likely have implemented a basic (though not finalised) version of option 3 by then. We may be in a position to discuss alternative approaches, but it'll depend on the licenses.

@Tchanders I added some clarity on which option we would be perusing and removed the questions. How does that look?

dbarratt updated the task description. (Show Details)Aug 31 2020, 9:17 PM

dbarratt updated the task description. (Show Details)Sep 4 2020, 12:51 AM

Tchanders renamed this task from Performance review of IP Info [WIP] [4H] to Performance review of IP Info extension.Sep 9 2020, 5:45 PM

Tchanders updated the task description. (Show Details)

Tagging Performance-Team. We're at the start of the project so we'll add more details as we can, but wanted to request this sooner rather than later. Thanks.

Tchanders edited projects, added Anti-Harassment; removed Anti-Harassment (The Letter Song).Sep 9 2020, 6:01 PM

Tchanders moved this task from Untriaged to Tracking work by others on the Anti-Harassment board.

Are you aware of this history: https://phabricator.wikimedia.org/T100902#2566841 ?

I know that we are paying for the commercial version of MaxMind for some of our servers, but I don't know under which license terms.

The performance of querying such a dataset depends on which one it is. For example querying MaxMind directly is pretty fast and potentially faster than a homemade solution, because it stores this data efficiently and has dedicated tooling for querying it. Ensuring that you have proper caching at the ATS/Varnish layer might be a sufficient caching layer in front of the API that queries the IP database. The Traffic team might have a different opinion about that, it's worth consulting them. If frontend cache is considered insufficient, given that the same IPs will be requested over and over again, a local or DC-local intermediary cache (eg. memcached) seems like a good solution to put in front of the IP database.

Since making an API that can return information about any IP address is probably out of the question, given that we shut down a service that did just that 4 years ago due to abuse. I presume that this data will be presented on demand in a lot of contexts, like with hovercards. In that case, it's counter-productive in terms of performance to look up the information about all the IPs present at the time the HTML is generated, if only a small percentage of users will hover those IPs to see that information. Which means that you'll need an API to let the client-side fetch that information on demand. In that case, the simplest course of action is probably to add an encrypted signature/key to the HTML, specific to the lookup of that particular IP. The API will consume that key, verifying that it came from our own HTML (based on a secret key that 3rd-parties can't have) and that the request was specifically for that IP. In that scenario, however, we have to be mindful of the time consumed to generated the encrypted signatures. If it is greater than the time it takes to get the IP information, then we might as well include it in the HTML to begin with and not have any API call at all for hovercards.

• Gilles moved this task from Inbox, needs triage to Doing (old) on the Performance-Team board.Sep 11 2020, 11:27 AM

In T260821#6453709, @Gilles wrote:

Since making an API that can return information about any IP address is probably out of the question, given that we shut down a service that did just that 4 years ago due to abuse.

The plan is to create an endpoint(s) that accept a log id or revision id (See T260603). Effectively users would only be able to lookup IP addresses that have made logged actions or revisions.

We may create an endpoint that would allow for any IP address, but this would be limited to use by privileged users (probably only checkusers), and would only exist until T261555 is completed.

Thanks for the information and suggestions @Gilles.

We had read about that previous service - thanks for bringing it up. We're planning to accept a log/revision ID as @dbarratt says. We'd also be adding an 'ipinfo' permission, that is only available to certain user groups (starting with checkusers). In this situation, do you think we'd still want to verify that the revision/log ID is present on the page?

The only reason it might accept an IP address would be for checkusers who are seeing the IP address of a logged-in user on Special:CheckUser or Special:Investigate - that IP address wouldn't be associated with a revision/log ID. However, the only reason to provide this tool on these pages would be for the checkusers' convenience, and not for the sake of IP masking. Unless this tool could provide better information than the information that checkusers can get by copying and pasting the IP address into their preferred tools, that would arguably be pointless. @Niharika - do you think we could revisit whether this should be available for checkusers investigating logged-in users' IP addresses? If not then it wouldn't need to accept IP addresses.

@aezell is currently discussing the MaxMind license terms, but under the present terms we would not be able to make use of our MaxMind dataset for this tool. I think one of the most significant questions here will be whether we are able to agree a license with MaxMind or any other service that allows us to do any type of storage/caching of their data. If not, then the third party would need to be queried on demand. Since we're having to start building this product before these questions are resolved, it seems sensible to me to assume the worst-case scenario for now, that we can't store any data. I suspect we'd be open to changing the product specifications a bit if it was felt that this was beyond what was acceptable performance-wise - e.g. not using hover (@Niharika or @Prtksxna might want to comment further).

@Niharika I suppose there's a tension here in that, from a product perspective we might prefer to use the service(s) with the highest quality data, but from a performance perspective we might prefer to use the service(s) with the most lenient license.

I'm told that MaxMind has a new competitor, https://db-ip.com/ which makes their dataset available in MaxMind's mmdb format: https://db-ip.com/db/ which means that the related tooling and libraries work with it. It's probably worth looking into their T&C as well. The free version of that database is CC BY licensed: https://db-ip.com/db/ip-to-location That could be a great starting point.

Chances are that this PHP library https://github.com/maxmind/GeoIP2-php pointed at the db-ip free database would perform well. Your new endpoint/API will need to be instrumented anyway, if the PHP library doesn't perform well it will be obvious.

In T260821#6461554, @Gilles wrote:

I'm told that MaxMind has a new competitor, https://db-ip.com/ which makes their dataset available in MaxMind's mmdb format: https://db-ip.com/db/ which means that the related tooling and libraries work with it. It's probably worth looking into their T&C as well. The free version of that database is CC BY licensed: https://db-ip.com/db/ip-to-location That could be a great starting point.

They have a number of competitors (See T248525). This isn't one I was aware of, but I can add it to my spreadsheet.

Chances are that this PHP library https://github.com/maxmind/GeoIP2-php pointed at the db-ip free database would perform well. Your new endpoint/API will need to be instrumented anyway, if the PHP library doesn't perform well it will be obvious.

It looks like we may be able to use our existing MaxMind license. I've created T262963 to have their library reviewed by the Security-Team in order to have it (and the dataset we currently pay for) deployed to production. :)

@aezell is currently discussing the MaxMind license terms, but under the present terms we would not be able to make use of our MaxMind dataset for this tool.

Update from @aezell is apparently that we can use it, so that's what we'll do - simplifies the problem a lot.

According to @Niharika we may want to look into using multiple services in the future, but not for the first iteration.

Tchanders mentioned this in T263634: Make IPInfo tool available on Special:Log page.Sep 28 2020, 7:37 PM

Tchanders updated the task description. (Show Details)Nov 2 2020, 2:48 PM

Unrelated to the performance review, but there seems to be a current limitation in ipv6 address comparison and 0-padding that I saw on the test wiki:

Screenshot 2020-11-03 at 13.11.35.png (397×349 px, 46 KB)

I see that the API responses don't contain any caching information:

Screenshot 2020-11-03 at 13.14.39.png (404×525 px, 55 KB)

While the JS seems to account for not re-requesting past API requests for a given IP in the current pageview, this means that after a page refresh or visiting another page with the same IP, you would be missing out on leveraging the browser cache. I think it would make sense to cache these API responses at least in the user agent, and it could be done for a long period, since the IP geo database isn't updated that frequently.

You should instrument the time it takes between a click on the (i) button and when the information is actually displayed. This can easily be done with mw.track. By measuring performance as experienced by real users this way, you will be able to see if there is any degradation once this feature gets more a lot more use once in production. You also get to track usage for free by doing this.

I see that the ResourceLoader module is correctly loaded only when dealing with the Special:Log page and a user who has the right privileges, that's great.

I would recommend against the placeholder animation with diagonal grey lines currently displayed before the results of the API request come in, since the API responses are usually very fast:

Screenshot 2020-11-03 at 13.33.55.png (155×433 px, 31 KB)

It's best to avoid drawing attention to short waits. Animated progress indicators become useful when the user is waiting for a long time (eg. waiting for the result of an expensive advanced search query). If you're worried that some users might wait 5-10 seconds for those API responses sometimes, then it's best to wait a couple of seconds before displaying the animation. This would ensure that most users (who will have a response under 2 seconds, for example) don't see the animation, and those who have very slow internet connections would see it, as they would be waiting a while for a result.

Looking at init.js, some of the variable initialisations could be done more lazily: https://github.com/wikimedia/mediawiki-extensions-IPInfo/blob/master/modules/ext.ipInfo/init.js#L4-L10 You're immediately looking for revIdAncestor and logIdAncestor in the DOM, as well as creating the button, when there are conditions - like not being a valid IP address - that will leave that unused. It's best to postpone doing those DOM lookups and object creation until you're 100% sure that they're going to be needed.

which will give all users the information from the varnish cache

Is the plan for this feature to be ultimately accessible to everyone, including logged-out users?

Is the plan for this feature to be ultimately accessible to everyone, including logged-out users?

No. This feature would be available to some logged-in users. We plan to make it available to checkusers only first and then potentially admins.

@Gilles Thanks for the thorough review - and not just of performance!

In T260821#6600060, @Gilles wrote:

Unrelated to the performance review, but there seems to be a current limitation in ipv6 address comparison and 0-padding that I saw on the test wiki:

We're working on this - filed as T266881 and T267015

I think it would make sense to cache these API responses at least in the user agent, and it could be done for a long period, since the IP geo database isn't updated that frequently.

Filed as T267230

You should instrument the time it takes between a click on the (i) button and when the information is actually displayed.

WIP task - T267235. This is WIP until the design is finalised (the button+popup might be changed).

I would recommend against the placeholder animation with diagonal grey lines currently displayed before the results of the API request come in, since the API responses are usually very fast

Filed as T267237

Looking at init.js, some of the variable initialisations could be done more lazily

Filed as T267240

which will give all users the information from the varnish cache

Is the plan for this feature to be ultimately accessible to everyone, including logged-out users?

I believe David was just thinking theoretically here.

dbarratt unsubscribed.Nov 4 2020, 5:20 PM

Awesome, thanks for filing all these tasks. That's all for now, since you mention that the design might change, please ping me when that's the case, so I can look at the changes. If there's any other major change, please move the status of this task back to "Open" and point me to what needs to be looked at.

Krinkle moved this task from Doing (old) to Doing: Goals on the Performance-Team board.Mar 2 2021, 9:05 PM

• Gilles removed • Gilles as the assignee of this task.Jun 22 2021, 12:41 PM

Niharika added a parent task: T285977: IP Info.Jul 14 2021, 3:43 PM

Niharika added a parent task: T287648: Reviews and deployment.Jul 28 2021, 11:34 PM

Niharika removed a parent task: T285977: IP Info.

Hi @Niharika: I see some movement in other tasks and such, is there a new expected timeline when Performance needs to do their review?

greg moved this task from Doing: Goals to To-do: Goals, prioritized next 4 Quarters on the Performance-Team board.Sep 20 2021, 6:26 PM

In T260821#7366619, @greg wrote:

Hi @Niharika: I see some movement in other tasks and such, is there a new expected timeline when Performance needs to do their review?

Oh yes! We are working on the MVP and hope to have the tool ready for soft launch in the next couple months. I am not exactly sure when we will be ready for a performance review so I'll ping @phuedx who knows better.

In T260821#7377718, @Niharika wrote:

In T260821#7366619, @greg wrote:

Hi @Niharika: I see some movement in other tasks and such, is there a new expected timeline when Performance needs to do their review?

Oh yes! We are working on the MVP and hope to have the tool ready for soft launch in the next couple months. I am not exactly sure when we will be ready for a performance review so I'll ping @phuedx who knows better.

@greg Are you the right person for us to contact about a re-review?

We have a proposed plan to deploy IPInfo to testwiki in a month and to the rest of our production sites as a BetaFeature by the end of the quarter. However, this may be blocked on a performance review, as more features have been added since T260821#6605932. What would you advise we do?

Also copying in @JayCano, who is our (not-so) new engineering manager.

We talked about this within the team and it seems like we have not made any major changes that may impact performance since the last review. With that in mind, we have decided to go ahead with the deploys without requesting another review. Thank you.

Krinkle mentioned this in T306746: Layout shift when IP Info arrives into expanded state (insufficient height reserved).Apr 23 2022, 7:39 PM

Tchanders mentioned this in T310813: Investigate: IPInfo buttons not always showing on Special:RecentChanges [8H].Jul 27 2022, 3:57 PM

Niharika moved this task from Backlog to Closed on the IP Info board.Aug 8 2022, 8:02 PM

	F32426575: Screenshot 2020-11-03 at 13.11.35.png
	Nov 3 2020, 12:48 PM

	F32426584: Screenshot 2020-11-03 at 13.14.39.png
	Nov 3 2020, 12:48 PM

	F32426634: Screenshot 2020-11-03 at 13.33.55.png
	Nov 3 2020, 12:48 PM

Performance review of IP Info extensionClosed, ResolvedPublicActions

Description

Description

Preview environment

Which code to review

Performance assessment

Related ObjectsSearch...

Event Timeline

Performance review of IP Info extension
Closed, ResolvedPublic
Actions

Related Objects
Search...