[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

ResourceLoader/Architecture

This page documents the architecture and features of ResourceLoader. ResourceLoader is Wikipedia's delivery system for JavaScript, CSS, localised interface icons, and localised interface text.

See also Presentations for recorded tech talks and slide decks that explain these features in audio-visual form.

Principles

edit
Main article: Frontend performance § Principles on Wikitech

ResourceLoader principles, in order of their relative importance:

  1. Users. (Perceived performance and overall user experience. This includes backend latency response times.)
  2. Developers. (Engineering productivity; ease of learning, maintaining, and debugging.)
  3. Servers. (Such as disk space, memory usage, CPU load, number of servers, etc.)

These are inspired by the W3C Design Principles.

Modules

edit
 
Visual representation of how most modules are composed.

ResourceLoader works with a concept of modules. A module is a bundle of resources identified by a symbolic name. They can contain any of the following types of resources:

Aside from that a module may have several properties:

All in all, this makes it possible to enqueue or load a module bundle by just using its name (instead of listing out all the resources and/or dependencies etc.).

Multiple module bundles are delivered to the client in a single request. More about this follows in the resource sections below. The response is unboxed by the Client.

Resource: Scripts

edit

Module scope

edit

Scripts are automatically wrapped in a closure to ensure each module bundle has its own local variable context (compatible with Node.js execution, and similar to ES6 modules).

Then the browser downloads a module. It is not immediately executed when the browser parses the script response. Instead, the closure is passed to the ResourceLoader Client. This allows the browser to download multiple modules and their dependencies in parallel, whilst still controlling the order in which they execute (even if a module arrives quicker than one of its dependencies). See the ResourceLoader Client section to learn about the loading procedure and a walkthrough of example scenarios.

Minification

edit

All scripts are minified before being included in the bundle. For this we use the JavaScriptMinifier library. In case of an internal cache-miss, the application backend will minify the code on-the-fly on the web server. See also the Caching section to learn more about the performance of packaging, and the caching infrastructure around it.

Conditions

edit

Scripts can be conditionally included in a module based on the context of the requesting client (e.g. language code and skin ID). This keeps responses relatively small by only including components relevant to the client context.

Example uses of conditions:

  • The bundle containing a language grammar parser includes a different implementation based on the user language.
  • The bundle containing the logic for rendering Notification includes an extra stylesheet file optionally provided by the user's currently preferred skin. The Vector skin component can register this custom stylesheet which is picked up by the Notification bundle owned by a different component.
  • Moment.js has regional definitions for 62 different languages. Only one of the regional definition files will be included at run-time. Language chaining and fallbacks are handled by MediaWiki's localization framework.

For how to use these and other module options (such as deprecated and skipFunction), see ResourceModules in the MediaWiki API docs.

Resource: Styles

edit

Compiling

edit

Starting with MediaWiki 1.22, there is native support in ResourceLoader for using LESS files. When registering a module's stylesheets you transparently include .less files among or instead of any .css files and these will be automatically compiled, cached, and invalidated accordingly. Image embedding and CSS localisation flipping is supported in combination and will be applied after the LESS compiler.

Embedding

edit
 
Visual comparison of what @embed does.

ResourceLoader offers the ability to automatically embed an image file into your stylesheet, by leveraging Data URI embedding. This can sometimes result in faster experience, through a shorter page load time, reaching visual completion sooner, and reducing the overall transfer size (bandwidth cost). As of 2016, it is generally believed that embedding adds more cost than benefit and so it is recommended to avoid use of @embed in new code.[1] Consult Wikimedia's Frontend performance practices guide to understand when its benefits may still be worthwhile.

To apply embedding to an image, use the "@embed" annotation in a CSS comment over the relevant CSS declaration:

.mw-foo-bar {
    /* @embed */
    background: url( images/icon-home.png ) 0 50% no-repeat;
    padding: 4px 0 4px 32px;
}

When you enable "embed" mode for an image, its binary contents are automatically base64-encoded and inlined into the stylesheet.

Initial icons are immediately visible

edit

Without embedding, a browser would generally first render the page without icons. After the stylesheet is completely downloaded, the browser will start rendering portions of the HTML. After any rendered portion, the browser discovers missing images from now-active style rules and requests them.

No delay or flash for interaction states

edit

When interacting with a component (e.g. CSS active, focus, orhover; or click in JavaScript), and a different CSS rule applies, the same missing image scenario as for the initial page load presents itself. The old rule would no longer apply, and the new rule would apply but cause a flash while the missing image is discovered and downloaded by the browser. By embedding the image file as a data URI, it is instantly visible.

Lower transfer costs

edit
 
Comparison loading CSS and referenced images with and without Embedding enabled. 27.3% reduction in total size after compression, 97.2% reduction in number of HTTP requests.

By sending the image files a part of a single CSS response, the Gzip compressor applies to it as a whole. This means PNG binary headers and SVG syntax can be liberally compressed, even across unrelated icons. If images were downloaded separately Gzip would be limited compressing each image individually.

We also eliminate the URL from the cost. Generally speaking, when referencing a file in CSS, that URL reference is in itself also data. That data is not actually useful though, as it is merely an instruction to download the actual file. This is sensible from a historical perspective, as this avoids an unwelcome download for large files such as background photos or fonts that may not be needed on some pages, and allows re-using this file from the browser cache later on.[2] We preserve these cache and re-use benefits through the cached CSS, rather than a cached image file.

By embedding the file, we only pay for the file data, instead of the file URL and the file data.

For comparison:

  • url('/w/extensions/MyExample/modules/oojs-ui.styles.icons/images/menu-progress.svg?12345');
  • url('/w/load.php?modules=oojs-ui.styles.icons-layout&image=menu&variant=progress&format=rasterized&skin=vector&version=12345');
  • url('data:image/svg+xml,%3Csvg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20"%3E %3Ctitle%3E bell %3C/title%3E %3Cpath d="M16 7a5.38 5.38 0 0 0-4.46-4.85C11.6 1.46 11.53 0 10 0S8.4 1.46 8.46 2.15A5.38 5.38 0 0 0 4 7v6l-2 2v1h16v-1l-2-2zm-6 13a3 3 0 0 0 3-3H7a3 3 0 0 0 3 3z"/%3E %3C/svg%3E');

In addition to the URL, we also save costs from having fewer HTTP requests. The CSS and the images naturally share a single HTTP request. As such, there are no additional HTTP transfers with their own request and response header data.

What about inflation?

edit

In theory, this approach may seem inefficient. When using separate HTTP requests, files can be transferred as binary data. For PNG files, this is 30% more compat than base64-encoding, which inflates the binary data.

In practice, due to use of Gzip compression, the CSS response essentially acts as a sprite image, combining the PNG headers across multiple files. This and the other cost savings listed above, make the end result smaller than the sum of the individually compressed and transferred CSS and image binary would have been, this despite the inflation from base64-encoding. In addition to being smaller, it also improves the user experience by loading and appearing faster.

What about sprites?

edit
 
Example of how sprites work. Using a custom repeat or positioning would "leak" other parts of the sprite. This is one of the reasons why we don't use sprites.

This technique makes traditional sprite images obsolete. While the motivation behind sprites is good (less HTTP requests, better compression) it comes with a few caveats:

  • Maintenance. If an image needs to be updated, one has to regenerate the sprite file, update the background positions in the CSS output, etc.
  • Produces overly complex CSS.
  • Imposes restrictions on image usage. Properties background-repeat, background-size or background-position may not be used due these leaking other images in the same sprite.

These caveats aren't the end of the world (sprites were in wide use, clearly they did work), and some other resource delivery systems do use sprites, and might even automate their maintenance. Our embedding technique, however, provides the best of both worlds. It holds up the advantages of sprites:

  • Reduce number of HTTP requests.
  • Improve compression by combining images in one file.

With the additional benefits of:

  • No restrictions on image use, and no sprite "leakage" bugs. Properties background-repeat, background-size or background-position may not be used due these leaking other images in the same sprite.
  • Clean CSS.
  • No maintenance.
  • Even fewer HTTP requests. There isn't even 1 image request, the CSS and the images share one request.

Remapping

edit

For icons that are not embedded, ResourceLoader transforms the relative file path into an absolute one. This is necessary because file references in CSS are meant to be relative to where the stylesheet is served from, which changes meaning when these files are bundled and served from a different URL.

The URLs are also made immutable by appending a truncated content hash as query parameter. This can be used by a cache proxy to disambiguate between multiple versions of the same file during a deployment, and to prevent a multi-server cluster from having the response from an "old" server populate the URL a browser got from a "new" server whilst mid-deployment (T102578, T47877)

Flipping

edit
See also Directionality support for more information about directionality support in MediaWiki.
 
Example of what CSSJanus does.

With the Flipping functionality it is no longer necessary to manually maintain a copy of the stylesheet for right-to-left languages. ResourceLoader automatically changes direction-sensitive CSS declarations (and more). Internally, the CSSJanus library provides that smart "flipping" logic.

Aside from flipping direction-oriented values, it also converts property names and shorthand values. And it converts references to filenames ending in -ltr into filenames ending in -rtl, thereby loading direction-specific iconography,

Consider the following example:

.foo { 
    float: left;
    padding-right: 0.5em;
    margin: 1px 2px 3px 4px;
    background-image: url(foo-ltr.png);
}

When loaded by ResourceLoader, without any additional changes or configuration, it is automatically turned into the following for users with a right-to-left interface language set:

.foo { 
    float: right;
    padding-left: 0.5em;
    margin: 1px 4px 3px 2px;
    background-image: url(foo-rtl.png);
}

Sometimes you may want to exclude a rule from being flipped. For that one can use the @noflip annotation. This instructs CSSJanus to skip the next CSS declaration. Or, when used in the selector part, it skips the entire following CSS ruleset.

For example:

.foo { 
    float: left;
    /* @noflip */
    padding-right: 0.5em;
    margin: 1px 2px 3px 4px;
    background-image: url(foo-ltr.png);
}

/* @noflip */
.bar { 
    float: left;
    padding-right: 0.5em;
    margin: 1px 2px 3px 4px;
    background-image: url(foo-ltr.png);
}

Output will be:

.foo { 
    float: right;
    /* @noflip */
    padding-right: 0.5em;
    margin: 1px 4px 3px 2px;
    background-image: url(foo-rtl.png);
}

/* @noflip */
.bar { 
    float: left;
    padding-right: 0.5em;
    margin: 1px 2px 3px 4px;
    background-image: url(foo-ltr.png);
}

Note: When using Less CSS and nested selectors, the noflip annotation must be placed above each individual rule, not above the selector.

Bundling

edit

Resources are combined in a single bundle. The loader response from the server bundles both scripts and styles from the requested module(s) in the same request. The Client receives this and loads the stylesheet in the DOM at the right time, so they are in memory when the relevant scripts that use these CSS classes, execute.

This means that neither the JS nor the CSS will run if JS is disabled. However, if you need the CSS to still run, you can add one or more CSS-only module with OutputPage->addModuleStyles( $modules ).

Minification

edit

All stylesheets are minified before being put in the bundle. For this we use the CSSMin library, which was especially developed for ResourceLoader.

Conditions

edit
See the Conditions section under Scripts for more information.

Similar to scripts, style bundling also features the ability to compose the module dynamically based on the context.

Resource: Messages

edit

Messages are exported as a JSON blob, mapping the message keys to the correct translation. They're fetched on the server from MediaWiki's localization cache (including its language fallback logic). Only message keys used by the module are included in the bundle.

Bundling

edit

All resources are bundled in the same request. The Client then takes the messages and registers them in the localization system on the client side, before the module's JavaScript code is executed.

Conditions

edit

As with the other two resource types, the messages component is also optimized to load only what is necessary for the requesting context. This is especially important considering that MediaWiki is localized in over 300 languages. Only 1 unique set of messages is delivered to the client.

Front-end

edit

So, how does all this play out in the front-end? Let's walkthrough a typical page view in MediaWiki, focusing on the ResourceLoader Client.

Startup Module

edit
 
The page loading process with ResourceLoader.

The startup module is the first and only hardlinked script being loaded on every page from a ‎<script> tag. It is a lightweight module that does three things:

  1. Validity check
    It starts by performing a quick validity check that stops if the current browser cannot support the base environment. This avoids incomplete interfaces and script errors, by preserving the natural non-javascript fallback behavior. For incompatible browsers, the startup module is the first and last script to be loaded. (view source).
  2. Module manifest
    It exports the module manifest. This contains the dependency information of all modules, request groups (if any) and the current version hash for each module. (see mw.loader.moduleRegistry in the console).
  3. Define the loader
    It defines the ResourceLoader Client.

The use of this manifest allows ResourceLoader to naturally avoid the Cascading Cache Invalidation problem that some bundlers suffer from. It also allows for "perfect" cache fragmentation and cache re-use through a defragmented module store.

Client

edit

The ResourceLoader Client is a tiny JavaScript library in charge of loading and executing modules from the server. It reads the module manifest and dependency tree as its input. This client is instructed by the HTML to load modules for the current page.

The client defines mw.loader which can be given a list of module names to load. It automatically handles dependency resolution using the internal dependency map. It also naturally de-duplicates and will not start loading or executing any module more than once.

The loading process is fully asynchronous (as of 2015, blog post), and modules are also requested from the server in batches.

Store

edit
See also Research:Module storage performance on Meta-Wiki.

The ResourceLoader client caches the contents of individual modules within the web browser (i.e. HTML5 LocalStorage). This drastically reduces and effectively eliminates cache fragmentation.

For example, imagine two unrelated modules A and B that both make use of a third module C that is exceptionally large. Module A is used on page "Foo", and module B on page "Bar". Without a module store, the following would happen:

  1. User views article Foo. Browsers makes network request for modules=A+C
  2. User views article Bar. Browsers makes network request for modules=B+C. (Thus downloading a second copy of C)
  3. User views article Foo. Browser uses cache for modules=A+C.
  4. User views article Bar. Browser uses cache for modules=B+C.

On the second page view, the browser was unable to use C from its cache, because it is stored under a batch request url. In the above scenario, the user would fully download the big "C" module multiple times, despite it not having changed, and it already being in the cache somewhere as part of "A+C".

With ResourceLoader's module store, the client caches each part of the response to the batch request separately in a local cache (backed by a localStorage blob). This is not affected by other modules in the same batch request. Let's reconsider the same scenario with these improvements:

  1. User views article Foo. Browsers makes network request for modules=A+C. These two are unpacked on arrival and locally stored separately, as "A" and "C".
  2. User views article Bar. Browser executes "C" from local store, and makes network request for modules=B. Then, "B" is also added to the store.
  3. User views article Foo. Browser executes "A" and "C" from store. No network request.
  4. User views article Bar. Browser executes "B" and "C" from store. No network request.

Execution

edit
This section is incomplete
  • Execution separated from loading/parsing.
  • Direct or delayed execution as appropriate based on module dependencies.
  • Insert messages and styles into memory before script execution.

Back-end

edit
See also: § Caching

Backend performance

edit
Pie chart of Wikipedia's cluster load for ResourceLoader, in July 2011. 

ResourceLoader is highly scalable and very cost efficient. Its server responses are cacheable and applicable to all users, allowing it to scale to a large deployment (such as Wikipedia) with only a handful of backend servers. This means that unlike for MediaWiki page views, ResourceLoader assets fully utilize the CDN even for registered users (i.e. "logged-in" page views).

In July 2011, Wikipedia's had about 400 servers (CDN edge servers and application servers). Our CDN served 90,000 requests per second at peak, of which 40,000 for ResourceLoader (e.g. JS and CSS resources). These 40,000 requests were served worldwide by only 9 Varnish frontend servers and 4 backend application servers. The cache hit ratio was 99.82%, resulting in only 73 req/s cache misses toward the backends.

As of September 2019, ResourceLoader enjoys a cache hit ratio of 99.86% over a 48-hour period – with a hit-peak 31,000 requests per second (in total our CDN edges saw 7,500,000 cache hits and 9,800 cache misses over the 2 days, data). Since 2011, our various optimisations and cache defragmentation have thus lowered the absolute request volume to ResourceLoader and increased its cache efficiency, despite Wikipedia's overall growth in traffic over the same decade.

On-demand package generation

edit

ResourceLoader features on-demand generation of the module bundles. The on-demand generation is very important in MediaWiki because cache invalidation can come from many places. Here's a few examples:

  • Core
  • Extensions
    Core and extensions generally only change when a wiki is upgraded. But especially on large sites such as Wikipedia, deployments happen many times a day (even updates to core).
  • Users[3]
    Wiki users granted certain user rights (interface administrators by default) have the ability to modify the "site" module (which is empty by default and will be loaded for everybody when non-empty). This is all without servers-side access, these scripts/styles are stored as wiki pages in the database.
    On top of that, each user also has its own module space that is only loaded for that user.
  • Translators
    The interface messages are shipped with MediaWiki core and are generally considered part of core (and naturally update when upgrading/deploying core). However wikis can customize their interface by using the MediaWiki message namespace to modify interface messages (or create new ones to use in their own modules).

Response

edit
GET /load.php?modules=foo|bar|quux&lang=en&skin=vector&version=…
mw.loader.implement('foo',
    function () {
        // Code for foo module
        mw.Foo = { ... };
    },
    // CSS for foo module
    [
        '.foo{color:blue;background-image:url(...FTkSuQmCC);}',
        '@media print { .foo{display:none;} }'
    ],
    // Messages for foo module
    {
        'foo-intro': 'This is an introduction to foo.',
        'foo-msg': 'Hello $1, check out $2!'
    }
);
mw.loader.implement('bar', ...);
mw.loader.implement('quux', ...);

Caching

edit

Guarantees

edit

ResourceLoader offers the following guarantees to developers and site operators:

  • The page HTML must be highly-cacheable for serving through a CDN, and generally only purged after the actual page content was changed.
  • References from page HTML to ResourceLoader URLs must not vary by user.
  • When people browse around on the website, there is a consistent experience with regards to the versions of styles and scripts.
  • Changes to styles and scripts take effect in most browsers within 5 minutes, and in all browsers within 10 minutes.

These guarantees allow ResourceLoader to operate within the constraints of Wikipedia's global caching strategy.

Change propagation

edit
 

The pipeline for change propagation both starts and ends with the startup module. The page HTML doesn't change one way or the other. And when the client loads individual modules, the version parameter is always a strict part of all relevant identifiers, cache keys, and URLs. As such, the details of how the Client works don't have to be considered when thinking about cache invalidation. The mw.loader logic, the browser cache, and the local module store will never serve stale content.

From HTML to stylesheet

edit

We start from the page HTML, which directly loads the bundled stylesheets from load.php. The stylesheet URLs here are without a "version" parameter. This is important because we can't change the HTML after every deployment. We allow browsers to cache the stylesheet and use it offline unconditionally for at least 5 minutes. After that the browser will perform an HTTP revalidation request to renew it for another 5 min if it hasn't changed. More details on that below.

All files referenced within the stylesheet (e.g. for SVG icons) are automatically versioned by ResourceLoader by incorporating the file hash. This way, the versions of the icon and the CSS code are always in sync with each other when applied to a page. There is no need for forward- or backward- compatibility between the CSS code and the files that it references. This is most obvious when the icons are embedded using @embed, but even without embedding these are kept in sync through the automatically versioned URL references. These icons and such may be cached and reused offline indefinitely.

From HTML to startup

edit

The JavaScript story starts from the page HTML, which loads the Startup module. The startup module contains the module manifest of version hashes, which we later use to construct URLs to load modules from e.g. /load.php?modules=foo&version=xyz.

The startup module is the only script linked from the HTML, and it is also the only script we load without a "version" parameter in its URL. This is important because we can't change the HTML of all articles after every deployment (noting that MediaWiki also features site scripts and gadgets, which allow people to edit the source code of some modules via on-wiki pages!).

To avoid browsers having to download the module manifest repeatedly, we use the browser's offline HTTP cache, with an occasional HTTP revalidation request.

Startup offline cache

edit

During the first pageview in a browsing session, we download and cache the module manifest in the browser's HTTP cache for 5 minutes. During that time the browser re-uses it unconditionally, fully local and fully offline. The CDN also caches its copy from the backend for 5 minutes. These two sliding windows together ensure that all pageviews incorporate the latest changes after no more than 10 minutes.

At WMF, this is optimised such that the browser's 5-minute window intuitively starts from when the browser downloaded it. This is unlike standard HTTP Cache-Control, which by default offsets the browser to countdown from when the CDN originally got its copy, which would mean most users only get to use their offline copy for for 1-2 minutes (T105657).

Startup revalidation

edit

If during a pageview the manifest is more than 5 minutes old, the browser will make a conditional HTTP request. While the original request simply downloaded the manifest, a conditional request will only respond with the manifest if it has changed. Browsers generally keep cached resources until long after they expire, specifically to make this possible. In the original download we send an E-Tag response header, which the browser passes back to us in the If-None-Match request header. If the manifest hasn't changed, we get a cheap 0-byte HTTP 304 response from the CDN, which lets the browser upgrade its stale copy back to "fresh" for another 5 minutes and the cycle repeats.

Cache invalidation

edit
 

Every module has a version hash. This version hash is how we decide whether to bust the cache, or to allow re-use (if the version and module content remained the same). For most modules, the version hash tracks the following:

  • Content of CSS files, JavaScript files, and other files in the module definition. This works by taking a checksum of these files.
  • Content of imported files. For example, a LESS file may import other files when compiled. These imported files are added to our internal list of files to hash for this module, since changes to the imports may change the output of the stylesheet.
  • Images referenced in stylesheets. ResourceLoader automatically adds a version query string to image references in stylesheets (to allow maximum caching). If the referenced image changes then even if none the CSS is explicitly change, we will regenerate the stylesheet to link to the new image version (see also: Remapping).
  • Content of interface messages from the localisation cache.
  • Module definition. The definition includes the order in which files are included, and other metadata that may influence the module output without the files themselves changing.

The reason we track these changes, as opposed building the module on-demand and hashing the output, is that clients need to download the modules from a URL, and that URL must be cachable, and thus versioned. As such, we'd have to rebuild all modules on the server when merely delivering the startup manifest (Wikipedia has over 1000 registered module bundles). As of August 2019, this remains infeasible to precompute at "build" or "deployment" time. Wikimedia Foundation hosts over 900 wiki configurations, 400 languages, 5 skins, and 1000 modules. Precomputing each of these variants would take hours. And that's before we factor in that wikis are in constant flux through on-wiki abilities to edit localisation messages, change site configuration, and editing of certain script pages.

Module batch request

edit
 
Balance is important.

When the client has no locally cached copy of a module+version, it will load it in a batch with other uncached modules from a load.php URL that carries a version parameter. This allows web browsers and the CDN to effectively consider them immutable. Browsers may cache and unconditionally re-use these responses on subsequent page views based on the far-future Expires header (or the "max-age" Cache-Control directive).

These batch URLs sort the module names in alphabetical order to increase changes of hitting a previous cache entry in the browser cache or the CDN, as well as to improve Gzip compression by letting the server naturally place related module bundles together in the response.

The "version" parameter on these URLs is a combined version hash, a "hash of hashes".

Groups

edit

The module request "group" can be used to optimise cache fragmentation. By default any two modules are allowed to be loaded together in the same batch request. The client store prevents most cache fragmentation automatically, which is why you should generally not set the "group" option.

If fine-tuning is needed, then one or more module bundles can be forced to be split in a dedicated request group. This increases device cost and transfer size because it involves more HTTP requests, and reduces compression effiency. However, a module that is too large to fit in the client store (e.g. VisualEditor core) may benefit from having its own group, as it ensures its requests look the same even when other modules in the queue vary by page, thus allowing more optimal use of browser cache.

Any freeform string can be used as a group name. Modules with the same request group assigned may be loaded in the same request.

It is conventional to use lowercase dashed name, typically derived from a substring of a the related modules names (e.g. "jquery-ui" or "ext.foo").

Beware of the below reserved names. The reserved groups have as special added behaviour that they disqualify for client store optimisations and also have additional behaviour:

  • user. Reserved for modules that vary by username (e.g. user scripts). These HTTP requests get an extra "user" query parameter. This parameter is available in the ResourceLoaderContext object passed to content methods (e.g. getScript, getStyles). Due to the extra parameter, they don't share cache with other users or logged-out users. The cache will be public. The stylesheets in this module group are loaded after all other modules (last cascading order), through the DOM's "DynamicStyles" marker.
  • private. Reserved for modules that are not allowed to be loaded from the public load.php endpoint (e.g. for CSRF tokens). Modules in this group are automatically embedded by OutputPage in the HTML when loaded. They cannot be loaded on demand.
  • site. Reserved for stylesheets that are user-generated content, but are not user-specific (rather for the entire site). The stylesheets in this module group are loaded after all other modules (last cascading order), using the "ResourceLoaderDynamicStyles" marker as separation.
  • noscript. Reserved for stylesheets that are user-generated content, but are not user-specific (rather for the entire site). The stylesheets in this module group are loaded after all other modules (last cascading order), using the "ResourceLoaderDynamicStyles" marker as separation.

Debugging

edit

Disable on a single page

edit

To make it easier to debug a specific page without the influence of site-wide or user-specific gadgets, scripts, or styles, it is possible to temporarily disable them by setting the "safemode=1" query parameter on any page, e.g. https://www.mediawiki.org/w/index.php?title=Project:Sandbox&safemode=1.

Note that if $wgAllowSiteCSSOnRestrictedPages is set to true, site-wide stylesheets will be considered "safe" and this parameter will not disable them.

Debug mode

edit

To make development easier, there is a debug mode in ResourceLoader, which you can enable by setting the debug query parameter, the resourceLoaderDebug cookie, or the $wgResourceLoaderDebug configuration variable (in decreasing order of precedence; the config variable supports legacy mode only).

In modern mode (debug=2), minification and batching are disabled, and JSON data and templates are pretty-printed. Any extra files specified in a module's debugScripts option will also get loaded. It will also disable most caching, treat PHP warnings more aggressively, and enable mw.log() to send debug logs to the browser console.

In legacy mode (debug=1), in addition to the behaviours of modern mode, ResourceLoader also tries to load JavaScript source files in their raw form, directly from the file URL (instead of via load.php). This was originally done to provide detailed exception stack traces (from a time before source maps support, T47514). Note that this affects variable scope and thus may change script behavior.

Setting debug=true is an alias for legacy mode (in the future, this might change to modern mode). Other values, including false and 0, turn off debug mode; this can be used to temporarily override a cookie or LocalSettings.php configuration that enables debug mode.

Conclusion

edit
 

In conclusion we'd like to think of ResourceLoader as creating a development environment that is optimized for:

  • Happy developers
    Easy to work with modules without worrying about optimization, maintenance, building, or what not.
  • Happy servers
    The application itself scales well, and is optimized to run on-demand.
  • Happy users
    Faster pages!

Annex: JavaScriptMinifier

edit
JavaScriptMinifier and CSSMin are part of a standalone library. It is freely licensed open-source software.
wikimedia/minify on Packagist.org
 
JSMin.php is slow compared to the new JavaScriptMinifier.php script.

Although the re-generation of a module bundle should be relatively rare (since cache is very well controlled), when it does happen it has to perform well from a web server.

For that reason we don't use the famous JSMin.php library (based on Douglas Crockford's JSMin) because it is too slow to run on-demand while a page is loading. JSMin.php takes about 1 second for jquery.js[4], which is okay if you're on the command-line, but when working on-demand in a web server response (with hundreds of large files needing to be minified) waiting that long is unacceptable. Even more so under heavy load (to avoid a cache stampede).

Instead, ResourceLoader uses Paul Copperman's JavaScriptMinifier. This runs up to 4X faster than JSMin. In addition to the speed, time has told that JavaScriptMinifier interprets the JavaScript syntax more correctly and succeeds in situations where JSMin outputs invalid JavaScript. The output size of JavaScriptMinifier is slightly larger than JSMin (about 0.5%, based on a comparison by minifying jquery.js[4], where the difference was +0.8KB out of 160KB). This is considered acceptable given the bigger picture.

ResourceLoader doesn't aim to compress as small as possible no matter the cost. Instead, it aims for a balance, getting large gains in a wide range of areas while also featuring instant cache invalidation, fast on-demand bundle generation, and a transparent "build"-free environment for developer. The slight increase in payload then becomes an acceptable trade off, expected to be more than regained by the efforts it enables.

Annex: Strategic decisions and future consideration

edit

This is a list of ResourceLoader's strategic null decisions for future consideration. This above ResourceLoader Architecture page provides a reasonably complete and high-level overview of ResourceLoader as it exists in the MediaWiki platform today, including what each part is for, and how the components evolved over time. What it doesn't describe is decisions of what not to build or support. The below serves as a public record with independently verifiable reasons to remove an existing feature (thus it is no longer in the architecture), or reasons to not change something (thus no change is mentioned in the architecture).

Understanding what we didn't do, or no longer do, can be just as important to understanding the system, as what we did do.

  • <script deferred>: The deferred script attribute is sometimes preferred over the async attribute. In 2015 we evaluated both options, and choose script-async (T107399). This decision was re-evaluated in 2023, where we found that script-async remains the optimal choice for Wikipedia, based on a faster First Contentful Paint and overall Page Load Time (T325480#8528502).
  • Synchronous scripts: When ResourceLoader first launched in 2010, it needed to co-exist with "legacy" scripts that loaded raw from JS files (no versioning, bundling, or minification). In order to transition such scripts piecemeal, and to allow migration of "base" modules (such as jquery, wikibits, and mw.config) to reap the benefits of ResourceLoader, we launched with both a "synchronous" and "asynchronous" mode, whereby new code is asynchronous by default, but any base modules or transitioned legacy scripts have to use the synchronous mode since their load order requires being positioned prior to other legacy scripts. This was eventually generalised into a "top" and "bottom" position (T37065). The "top" modules were loaded through a synchronous <script> tag, combined with document.write(), which inserted additional blocking scripts before letting the browser parse and render article content. Continued adoption however slowed down page load time, and violated our frontend principles. In 2015, we prioritized conversion of legacy features to either use modern CSS or PHP-rendered HTML, with any JS as an optional progressively-enhancing async module. We also invested in a PHP package for OOUI and HTMLForm, so that individual features can re-use these as CSS-only components. See also Perf Matters 2015, T107399, and T109837.
  • globalEval, domEval, "new Function": There a number of different ways to execute ad-hoc JavaScript code. Spread over a decade from 2011 to 2023, we've analyzed and tried all of them. For lessons learned, trade-offs, and requirements, you can check out the tasks, commit messages, and code comments linked from T60259#9988639.
  • Sass, Compass and other CSS compilers: We decided to use Less instead (T48545), per the 2013 decision at Requests for comment/LESS.
  • LESS compiler for user scripts: ResourceLoader supports the Less stylesheet format as a developer productivity feature by enabling automated and centralised re-use of variables, mixins, and dynamic logic in stylesheets. Stylesheets that are developed in MediaWiki core, skins and extensions can transparently list a style.less file among CSS files. ResourceLoader automatically compiles these to CSS (#nobuild, no build step since 2010!) via the wikimedia/less.php package. Due to inherent security and performance risks in the compiler, this is limited to code-reviewed and server-side deployed code. As such, this is not permitted in user-generated code such as user scripts, Gadgets, and TemplateStyles (2013 decision: T272774, 2021 revisited: T56864). Emerging CSS4 standards like CSS Variables and CSS Nesting, will reduce reliance on compilers like Less.php in favour of native web standards. This can bring Codex tokens, and skin-controlled variables (e.g. dark mode) to Gadgets and TemplatesStyles (T340477).
  • Prefetching JavaScript modules: Loading code early competes with critical resources. Loading code late disrupts frame rate, may increase input delay at the wrong moment, and wastes client bandwidth. Our offline and de-fragmented module cache generally eliminates network latency on warm-ish clients. For first views, the use cases that have come up so far were not noticeably bottlenecked by loading JavaScript. For example, VisualEditor loads its code concurrent with the Parsoid document and masks the latency that way. (2015 example: T59952, 2017 example: T176262).
  • Late-loading styles: Further research shows the effect of splitting up the style queue is net-negative. We're better off having a single request higher up in the HTML head, as opposed to having two, where one is toward the end of the body (2015 experience: T97420).
  • Inline script before stylesheets: Why do we place a seemingly non-essential inline <script> above an important stylesheet, in the HTML head? Our critical stylesheets (i.e. the ones styling page layout and article content) are more "important" than our JavaScript. ResourceLoader is specifically designed under our frontend principles with all JavaScript code as optional, progressively-enhancing, and asynchronous. This is reflected in how our HTML content does not directly request any JavaScript modules via script tags, not even async. Instead, the startup module does this later in the background (the indirection serves our HTML caching strategy). This leaves our HTML content with 1 inline script, 1 small async script ("startup"), and our stylesheets. Given they are next to each other in the HTML head, browsers discover these together in the first bytes. Changing their relative order does not affect how soon browsers discover each one. However, changing their order does radically influence page load time. Browsers are required to execute inline scripts before parsing the content below it. Browsers are also required to provide information to JavaScript about parsed stylesheets before the current script in the DOM. As such, the alternative order "CSS, inline script, article content" would cause the browser to have to halt all HTML parsing while it synchronously fetches and parses the CSS, to unblock the inline script, which in turn unblocks beginning the processing of article HTML into a DOM. Since 2015 (change 231434) we use "inline script, CSS, article content" which means browsers fetch the CSS async in the background, execute the tiny standaone inline script immediately, and at the same time parse and render the HTML content (declined: T110938).
  • Language-dependent styles: In order to maintain an easy-to-understand API we avoid creating multiple ways to solve the same problem. The use cases that have come up so far were adequately or better solved by approaching the problem from a different perspective using other MediaWiki platform capabilities (e.g. server-side localisation of HTML, CSS lang pseudo-class, or CSS classes). An arbitrary by-lang stylesheet loader could be needed if number of variants and conditional sizes grow beyond a certain point. The list of wgResourceLoader options has grown. We can add more, but given an infinite number of options we could add, we should only add the ones where we have a concrete, proven, and non-trivial performance gain (2018: T101936).
  • Advanced minification: ResourceLoader is designed with web standards in mind, such that its processing step is logically optional and for performance improvements only (#nobuild, no build step since 2010!). However, this does not mean we have to run profcessing steps on-demand. What if we could spend additional time asynchrnously to optimize payloads further with an advanced minififer? This is an idea we first considered in 2013 and evaluated more deeply in 2020 (T49437). The idea was declined for five independent reasons detailed at T49437#10216096. In short: We focus our key metrics on the user experience. The bottleneck for JavaScript is execution time, not transfer size.[5][6] We load all JavaScript asynchronously (in the background or concurrently with other tasks), not in the critical path where the user is waiting for it. As such, marginal reduction of the size and download time does not visibly improve the user experience. As an oppertunity cost, it makes more sense to invest removing code, reducing what we load, and improving offline caches.
  • Preload links: In 2017, we introduced use of HTML Preload links to improve page load time by letting the browser discover critical subresources earlier (blog post). This includes the project logo via the stylesheet (T100999), and JavaScript base modules (T164299). We use it via the Link: rel=preload HTTP header on CSS or JS subresources (which works!). We avoid it on the HTML resource directly, as this would severely compromise either our caching strategy or the architecture guarantee of rapid global deployments.
  • HTTP/2 Push: HTTP/2 Push was evaluated but considered undesirable due to wasting valuable client bandwidth. Some CDN companies tried to spread adoption of HTTP/2 as potential performance improvement by using non-standard mechanisms to automatically send a HTTP/2 Push, for every declared "preload" link. HTTP/2 Push has now been long-deprecated in the industry. However, its well-deserved negative sentiment is sometimes wrongly associated with HTML Preload. This was re-evaluated in 2023, and HTML Preload remains highly effective in our architecture at virtually no cost or downside (T325960).
  • Cache-Control immutable: This new HTTP feature landed in browsers and gained momentum in 2016. We re-evaluated it in 2022 and decided against adopting it (T149837#8221190).
  • mediawiki.Uri API: The mediawiki.Uri module was developed in 2010 and has since become a widely adopted stable JavaScript API used throughout the MediaWiki ecosystem. In 2015, the WHATWG URL specification gained momentum in browsers and has filled a much-needed gap in development for the Web, by offering a high-level URL class built-in to every browser. There are subtle differences between the WHATWG URL standard and our mediawiki.Uri module. We decided not to break compatibility of our stable module (2021 example: T287935, 2022 example: T66884). Instead, we encourage developers to adopt the user-friendly URL class directly in modern browsers (by using it instead of, or transitioning away from the older mediawiki.Uri module). To allow adoption today, ResourceLoader ship an automatically skipped polyfill for older browsers via the web2017-polyfills module (T103379, powered by a Skip function).
  • Inline critical styles: In 2021, experiments demonstrated that inlinling critical styles would be detrimental for the page load time on a majority of page views, i.e. not first/cold clients. The benefit for cold views was minimal compared to the outsized regression elsewhere (T124966). There is also the downside of incurring day-to-day complexity and requiring continuous and widespread awareness among all frontend developers to upkeep such a "critical styles" system (it is intolerant to even small imperfections, which would undo the benefit whilst leaving the regression). Inlining critical styles might benefit small static sites, small web apps, or web apps with large amounts of other unresolved performance problems. Wikipedia already has one of the best page load times among top 100 sites worldwide. The Web is designed to afford us one roundtrip for stylesheets, and we utilize that. In return this grants us a scalable caching strategy, and an architecture guarantee that deployments to effect globally within 5 minutes, and a consistent atomic experience between pages for individual clients. These benefits allow engineering teams to cut meaningful complexity, reduce required expertise in each team, through empowering and non-leaky abstractions. The project is declined on the basis of opportunity cost better spent elsewhere, and unresolved performance regressions on repeat-views, and unjustified externalized costs (2021 decision: T124966).
  • User-customization load order: There is a mutually exclusive choice between loading site-wide customisations first (allowing personal scripts to override or extend these) and loading user scripts first (with site-wide code applied later). Changing the order between individual stages or scripts is generally a net-loss because it changing it is highly disruptive and breaks large numbers of existing scripts. There is not a perfect load order for every use case. It is not worth it to make an uncommon use case that already works today, marginally easier to maintain, by breaking large amounts of other code in ways that are impossible to prepare or transition ahead of time. The load order we have today is generally believed to make important and common use cases easy and intuitive. Additionally we provide tools like mw.hook and mw.loader.using that support gadget authors to build anything else without load order constraints. (2022 example: T294173).

References

edit
  1. T121730: Audit use of @embed by Timo Tijhof (2019), phabricator.wikimedia.org.
  2. The Dangers of Data URIs, Andy Davies, 22 December 2020.
  3. These user-module features are disabled by default, but can be enabled from the server configuration. See: $wgUseSiteCss , $wgUseSiteJs , $wgAllowUserCss , $wgAllowUserJs . Wikipedia, for example, has these enabled. Also, the Gadgets makes it possible for users to create opt-in modules and/or separate features into multiple modules (as opposed to the "site" module which is only 1 module and loaded for everyone)
  4. 4.0 4.1 Based on a git distribution of jquery.js (160 KB) from July 2011. JSMin.php took 0.85s to minify that.
  5. Can You Afford It?: Real-world Web Performance Budgets , Alex Russel, 2019.
  6. Making instagram.com faster: Code size and execution optimizations (Part 4), 2019.