[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Page MenuHomePhabricator

Performance review of Quickview panel on Special:Search (SearchVue extension)
Open, MediumPublic

Description

Description

The structured data team is making some UI/UX improvements to the Special:Search page on the Wikipedias. As part of those improvements, we are adding a new vue.js app as part of the SearchVue extension in order to create a "QuickView" panel that displays more information about the search results. Here is the MediaWiki page describing the project. T306341 is the Epic describing the feature and the user experience we are aiming for.

To achieve the experience we are after and to align ourselves with the work of the design systems and WMDE teams, we are creating the new result QuickView using vue.js.

Vue.js was selected as the new javascript framework for use with MediaWiki ( T241180 ) and is now served as part of core. MediaSearch was the first production extension to leverage vue.js and has now been followed by machinevision, globalwatchlist, and the nearby pages extension.

This extension will add some JS code & JS config vars to Special:Search. JS will pick up clicks on (NS_MAIN) results and open a side panel with additional information about the result (much like Quick View on Special:MediaSearch)
For the first few months of the extension being release, we are also going to show a tutorial to the users to make sure that they know how the new feature work.

Preview environment

The changes will be made on Special:Search on the Wikipedias, starting with RU, PT, and ID wikis, and a link to the feature on beta cluster will be made available when ready.

Which code to review

(Provide links to all proposed changes and/or repositories. It should also describe changes which have not yet been merged or deployed but are planned prior to deployment. E.g. production Puppet, wmf config, or in-flight features expected to complete prior to launch date, etc.).

The code doesn't exist yet, but here is the ticket where we will be investigating the specifics of adding vue.js for Special:Search: T307560

Here is the ticket where we will actually be adding the app: T307053

Here is the epic for all of the UI/UX changes related to this feature: T306341

Performance assessment

Please initiate the performance assessment by answering the below:

  • What work has been done to ensure the best possible performance of the feature?
    • As mentioned above, vue.js was selected as the new javascript framework for use with MediaWiki ( T241180 ) and is now served as part of core. Several extensions already use vue.js with good performance results. We will also make sure that there is a fallback version of the experience that does not rely on javascript.
  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?
    • API calls to fetch the additional information to display in the side panel (once a search result has been clicked). This includes:
      • Article description (via action api, prop=pageprops -> wikibase-shortdesc)
      • Associated Wikidata ID (via action api, prop=pageprops -> wikibase_item)
      • Article thumbnail with max width of 420px (via action api, prop=pageimages -> thumbnail -> source)
      • Larger search snippet (via action api, prop=cirrusdoc)
      • TOC/list of sections (via action api, prop=cirrusdoc -> headings)
      • Links to interwiki results based on same related Wikidata ID (source TBD)
      • Commons request to fetch images (via Commons' action api, generator=search&gsrsearch= filetype:bitmap|drawing%20custommatch:depicts_or_linked_from=QXXX&gsrnamespace=6&prop=imaginfo&...)
  • Are there potential optimisations that haven't been performed yet?
    • /
  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.
    • We're talking to Search team WRT additional load on their end (additional calls for Commons results + fetching indexed content for larger snippets) and will adjust as needed based on their input.
    • No other measurements at this point - any advice is most welcome!

Additional details on API call parameters
Action: Query request to include the following parameters:

action: 'query',
format: 'json',
titles: title,
prop: 'pageimages|pageprops|cirrusdoc',
formatversion: 2,
pithumbsize: 420,
piprop: 'thumbnail|name|original'
cdincludes: 'heading'

Commons API:

action: 'query',
format: 'json',
generator: 'search',
gsrsearch: 'filetype:bitmap|drawing custommatch:depicts_or_linked_from=QXXX',
gsrnamespace: 6,
gsrlimit: 6,
prop: 'imageinfo',
iiprop: 'url',
iiurlheight: 364

Interwiki: TBC (in the future; not part of initial release)

Data interaction diagram

SearchVue.drawio (1).png (821×775 px, 84 KB)

Event Timeline

FYI @Krinkle, the team is targeting rollout for the end of next Q, but will probably release under feature flag after the end of the current Q.

Hi @larissagaulia and @Krinkle - we are hoping to have this in production on our three target wikis (PT, ID, RU) at the end of October, and we're wondering it's okay with your team if we start branching the extension now (T310367), prior to the review being complete -- this will allow us to deploy rapidly once all testing and review have been completed and give us more time to prepare for the late October release.

Krinkle renamed this task from Performance review of Quickview panel on Special:Search to Performance review of Quickview panel on Special:Search (SearchVue extension).Sep 19 2022, 6:38 PM
Krinkle updated the task description. (Show Details)

"Performance assessment" is rather empty right now. We're still no clear on many things, but I'll try my best to provide at least some more detail below.

Our ask at this point is whether, even though we can not yet provide code or even complete information about the extension, this review is a blocker for beginning to branch our code.
We must absolutely respect this (and any other) review before deploying to production - no code will run production before all reviews are complete!
It would be great, however, if we could already start to have code branched & ready to deploy by the time these reviews complete - it otherwise takes another 2 (or so) weeks until code is available in production branches. We're hoping this is fine, since none of this code will actually run anyway.

Question #2 (assuming that we can start branching before this review completes) - would it also be acceptable to enable this extension on beta prior to this review completing? The nature of this extension (combining things from multiple extensions & wikis) makes it almost impossible for design/product to get insight into how things work, look & feel without a somewhat-closely-resembling-production setup...
If this is at all a possibility, we're more than happy to take into account any consideration you may have to ensure this has no substantial impact on our beta cluster!


Short summary: this extension will add some JS code & JS config vars to Special:Search. JS will pick up clicks on (NS_MAIN) results and open a side panel with additional information about the result (much like Quick View on Special:MediaSearch)

More (preliminary) performance assessment info:

  • What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?
    • API calls to fetch the additional information to display in the side panel (once a search result has been clicked). This includes:
      • Article description (via pageprops wikibase-shortdesc)
      • Associated Wikidata ID (via pageprops wikibase_item)
      • Article thumbnail (via PageImages or pageprops page_image_free)
      • Larger search snippet (via action api, prop=cirrusdoc)
      • TOC/list of sections (via parse API, I assume)
      • Links to interwiki results based on same related Wikidata ID (source TBD)
      • Related images on commons (external search call to Commons)
  • Are there potential optimisations that haven't been performed yet?
    • We're still in the process of figuring out exactly what data we'll be showing (or even is available), how best to combine it.
      • The first 4 should be easy to come by already via action api & props.
      • TOC still TBD - we may find we need to build a new endpoint that aggregates this data, along with the other 4, more efficiently - haven't looked into this in much detail yet).
      • Interwiki & Commons results will likely be async JS calls.
  • Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.
    • We're talking to Search team WRT additional load on their end (additional calls for Commons results + fetching indexed content for larger snippets) and will adjust as needed based on their input.
    • No other measurements at this point - any advice is most welcome!

The extension repo can be created (https://www.mediawiki.org/wiki/Gerrit/New_repositories), with patches reviewed and merged, and be deployed to beta, all before this review. In fact, it's helpful for gauging possible performance problems if the extension is either easy to setup locally or already setup on beta. Generally, the main reason to consult Performance team before initial coding would be major uncertainty about whether an idea is tenable to implement at all (e.g. to avoid wasting time making thousands of lines of code that can't be used). Are there such concerns here?

I see that a SearchVue repo currently exists with a fair bit of JS code. How functionally complete is it? It seems like the task description is outdated.

Also, I don't see anything wrong with having X.XX.X-wmf branch for SearchVue, with the code present on the servers but not enabled in mediawiki-config (subject to a quick review by the security team).

Here are some clarification following the initial development of the patches required to fetch data:

Summary
This extension will add some JS code & JS config vars to Special:Search. JS will pick up clicks on (NS_MAIN) results and open a side panel with additional information about the result (much like Quick View on Special:MediaSearch)
In addition to the short summary above: For the first few months of the extension being release, we are also going to show a tutorial to the users to make sure that they know how the new feature work.

Updated information with actual development decision and improvement:

What are likely to be the weak areas (e.g. bottlenecks) of the code in terms of performance?

  • API calls to fetch the additional information to display in the side panel (once a search result has been clicked). This includes:
    • Article description (via pageprops wikibase-shortdesc)
    • Associated Wikidata ID (via pageprops wikibase_item)
    • Article thumbnail with max width of 420px (via thumbnail -> source )
    • Larger search snippet (via action api, prop=cirrusdoc)
    • TOC/list of sections (via action api, prop=cirrusdoc -> headings)
    • Links to interwiki results based on same related Wikidata ID (source TBD)
    • Commons request to fetch images (via "custommatch:depicts_or_linked_from=QXXX")

Are there potential optimisations that haven't been performed yet?

  • There may be a possibility to tailor the cirrusdoc request to just return the required data and save payload.
  • The first 5 should be easy to come by already via action api & props.
  • Interwiki & Commons results will likely be async JS calls.

Please list which performance measurements are in place for the feature and/or what you've measured ad-hoc so far. If you are unsure what to measure, ask the Performance Team for advice: performance-team@wikimedia.org.
We're talking to Search team WRT additional load on their end (additional calls for Commons results + fetching indexed content for larger snippets) and will adjust as needed based on their input.
No other measurements at this point - any advice is most welcome!

Additional details on API call parameters:

  • Action: Query request to include the following parameters:
		action: 'query',
		format: 'json',
		titles: title,
		prop: 'pageimages|pageprops|cirrusdoc',
		formatversion: 2,
		pithumbsize: 420,
		piprop: 'thumbnail'
  • Commons API:
		action: 'query',
		format: 'json',
		generator: 'search',
		gsrsearch: 'filetype:bitmap|drawing custommatch:depicts_or_linked_from=QXXX',
		gsrnamespace: 6,
		gsrlimit: 6,
		prop: 'imageinfo',
		iiprop: 'url',
		iiurlheight: 364

Interwiki: TBC

Note: the interwiki stuff mentioned above will not be part of the initial release.

@aaron

major uncertainty about whether an idea is tenable to implement at all (e.g. to avoid wasting time making thousands of lines of code that can't be used). Are there such concerns here?

Not really.

I see that a SearchVue repo currently exists with a fair bit of JS code. How functionally complete is it?

It will mostly be functionally complete by end of next week, 7 October (minus completing UI, finishing touches...)

It seems like the task description is outdated.

I'll copy @SimoneThisDot's details to the description.

@larissagaulia is there still more info your team needs, or can you share an estimate of when you might be able to complete the performance review? thanks!

Hey @CBogen , could you provide the version on the beta cluster for us to take a look, please?

Krinkle triaged this task as Medium priority.Oct 17 2022, 7:12 PM
Krinkle moved this task from Inbox, needs triage to Doing: Goals on the Performance-Team board.

Hey @CBogen , could you provide the version on the beta cluster for us to take a look, please?

You should be able to see this in action here:
https://en.wikipedia.beta.wmflabs.org/w/index.php?fulltext=1&search=Cat&title=Special:Search&ns0=1&quickView=Cat

Clicking the snippet (note: there will be a visual indication for this soon) of a search result should result in a new panel appearing on-screen. That panel is pretty much the entire extension!
Note: data on beta is not as rich & connected as it is in prod - many articles are not linked with Wikidata; nor are many Commons files. I've made sure that "Cat" is fully wired up and representative, but other results may not have much going on.

I'll finish looking at the backend/API access today.

A few things I noticed:

  • The query for thumbnails of files tagged with a QID should probably use a standard thumnails size (wgThumbLimits). This avoids he delay of generating thumbnails, avoids wasted disk space, and also lowers any risk of hitting thumbnail generation throttling for the client.
  • The TOC links should probably use cannonical /wiki/X#Y URLs instead of ?title=X#Y URLS. This grants better CDN caching.
  • Could use a core MW rest.php API result?
  • Standard hovercards should probably be disabled for search for UX and API overhead reasons (hovercards restbase requests trigger instantly on hover). The search popups should "take over" rather than compete.
  • The core query/prop API endpoint does not have caching (given all the combinations being hit). Maybe this feature could have a rest.php entrypoint, wrapping the backend API code, that returns exactly the required data, with a ~1 hour cache TTL and a simple URL structure. See https://www.mediawiki.org/wiki/API:REST_API .
  • Could use a core MW rest.php API result?

The existing ones don't have nearly enough data, and some are probably too specific (and too large a payload, e.g. elastic field contents) to add.
Which leaves:

  • The core query/prop API endpoint does not have caching (given all the combinations being hit). Maybe this feature could have a rest.php entrypoint, wrapping the backend API code, that returns exactly the required data, with a ~1 hour cache TTL and a simple URL structure. See https://www.mediawiki.org/wiki/API:REST_API .

We could create 2 new rest.php entrypoints from within this extension: one for the article data, another for the Commons results. (And further down the line likely a third for related interwiki results)

We'd rather not re-implement all the actual logic, though: all of that data is already made available & supported through various extensions and services, and reimplementing those things (e.g. fetching the document from elastic) would mean duplicating a bunch of existing code, in a separate extension that's not supposed to be an additional worry for the maintainers of those things.
Instead, we'd rather build a rest.php endpoint that just wraps around the exact same API call (performed on the backend) & filter whatever data is not needed (which, IIRC, isn't much)

Essentially something along the lines of:

$request = new FauxRequest( [ 'action' => 'query', ... ] );
$api = new ApiMain( $request );
$context = new DerivativeContext( RequestContext::getMain() );
$context->setRequest( $request );
$api->setContext( $context );
$api->execute();
$response = $api->getResult()->getResultData( [], [ 'Strip' => 'all' ] );

Does this make sense?
I assume that that is exactly what you were hinting at, but I just wanted to make sure we're on the same page!

Variables for the page endpoint are:

  • page title
  • (not yet, but soon) elastic field from which the search snippet originated

So that endpoint could be something like rest.php/searchvue/v1/page/{page_title}/{snippet_field}

For the Commons API endpoint, that would be:

  • QID

So that one could be look like rest.php/searchvue/v1/media/{qid}
Note: because it's targeting another wiki, wrapping this one would have to be a little different than the code snippet above; I imagine that would probably be an actual HTTP request that's efficiently routed within the same DC anyway?

One last question: about caching/TTL - I'm not sure how to set that up!
Is rest.php output already cached for ~1h by default, or where would we need to configure that? (Or did you mean caching responses ourselves, in a key-value store?)

  • The query for thumbnails of files tagged with a QID should probably use a standard thumnails size (wgThumbLimits). This avoids he delay of generating thumbnails, avoids wasted disk space, and also lowers any risk of hitting thumbnail generation throttling for the client.
  • The TOC links should probably use cannonical /wiki/X#Y URLs instead of ?title=X#Y URLS. This grants better CDN caching.
  • Standard hovercards should probably be disabled for search for UX and API overhead reasons (hovercards restbase requests trigger instantly on hover). The search popups should "take over" rather than compete.

No questions here - we'll start creating tickets and address them!