Double-keyed HTTP cache #904

annevk · 2019-05-09T15:18:37Z

The idea here is that the "browser's address bar origin" is an additional key for its HTTP cache, to prevent certain classes of attacks.

Safari ships a variant of this (uses registrable domain, not origin), but seems willing to adjust to origin. Other implementers are interested in shipping and are at various stages of experimentation.

This will require making all accesses of "the HTTP cache" more contextual, by accessing the HTTP cache of X whereby X is some defined origin. (Other ideas welcome, @mnot?)

I'm not sure where to store the defined origin. We could do a browsing context ancestor walk and that might be okay as I think all fetches always require a fully active document, but would be nice to have that confirmed.

(I'm also assuming that auxiliary browsing contexts are not special here and behave like other top-level browsing contexts for the purposes of this.)

cc @youennf @whatwg/security

mikewest · 2019-05-09T15:24:31Z

/cc @jkarlin for Chrome's experimental implementation.

youennf · 2019-05-09T15:28:04Z

Safari is partitioning service workers so that a service worker of an iframe B in a top level page A will be using the partition of top level page A.
This makes the loading more consistent in general (whether intercepted or not, the iframe loads will use the same partition) and improves on privacy.

jkarlin · 2019-05-09T16:26:03Z

Very interested. We'd like to mitigate security issues such as x-site search.

Chrome's experimenting with double keying by (top-frame origin, url) as well as triple-keying (top-frame origin, initiator origin, url). Triple keying protects caches of frames within a page from each other. The performance hit from double-keying seems reasonable at first glance, but it's important that we address x-origin prefetch in a multi-keyed cache world. We don't yet have data on triple-keying.

wanderview · 2019-05-09T17:34:09Z

This issue should also consider the differences between partitioned http cache and partitioned origin storage. I believe webkit's default partitioning include both http and origin storage. I don't think the chrome experiment includes any origin storage partitioning, though.

mnot · 2019-05-09T22:57:37Z

+1

It would be nice to try to factor out cache key computation to support not only this, but things like variants, etc.

annevk · 2019-05-10T07:54:26Z

Mozilla is interested in partitioning other bits as well, but for this issue I'd like to focus solely on the HTTP cache. Some of the infrastructure we might be able to reuse for the other bits mentioned above, but I don't think there's any strong reason to couple them from the start.

johnwilander · 2019-05-10T15:37:40Z

WebKit also partitions LocalStorage on eTLD+1 and used to partition cookies up until a couple of months ago (now the same cookies for third parties are just blocked instead).

In the case of partitioned LocalStorage, it is also not persisted which makes into a slightly weird SessionStorage.

I think eTLD+1 makes a lot of sense for partitioning unless we’re seeing (or expect to see) attacks that would be fixed with origin partitioning. However, as Youenn said, we’d be willing to harmonize with other implementers for consistency.

jkarlin · 2019-05-10T16:03:39Z

I'm a proponent of origin as it's the security boundary for most aspects of the browser and easier to reason about.

/cc @sleevi

annevk · 2019-05-10T16:08:06Z

Since the cache attacks are not that involved it seems rather risky to not do origin-based as it would mean a compromise of any example.com domain could be used to attack sensitive.example.com.

jkarlin · 2019-05-17T16:03:19Z

Another question is how we deal with x-origin <link rel=prefetch>. Which cache key does the prefetch use? If it winds up in the prefetching page's cache, it's a waste of network. But how do we know which key should be used?

I know of two options to make x-origin prefetch still work:

Allow prefetched resources to be used once within 5 minutes after prefetching, regardless of the cache key. This opens a one-way communication channel between the page that prefetched the resources and the one that consumes them.
Allow prefetched resources to be used once regardless of cache key so long as the page loading the resource was navigated to directly (for some definition of directly) by the page that performed the prefetch. This also forms a one-way communication channel between the page that prefetched and the one that used it, but they had a channel available anyway (link decoration).

/cc @yoavweiss @kinu

youennf · 2019-05-17T16:34:03Z

@jkarlin, there is an on-going implementation of prefetch with double key caches in WebKit.
So far, the implementation does not take into account which page is consuming the prefetch.
#881 is related.

jkarlin · 2019-07-08T15:32:51Z

Great, let's leave the prefetch discussion in #881 then. We're still doing the work to compare performance of double vs triple keying the network stack. Sorry for taking so long. Note that we're planning on using this key for the entire network stack (memory cache, disk cache, socket pools, etc).

shivanigithub · 2019-07-16T16:19:14Z

In terms of a spec for the double keyed cache, would appreciate inputs on what's a good place to spec it, possibly the Fetch whatwg spec.

annevk · 2019-07-16T16:43:42Z

@shivanigithub if you want to help with this that'd be great! My thinking here involved changes to the Fetch and HTML standards. In particular:

Change HTML such that it ends up exposing a "first-party origin" field on its environment settings object. This will likely require changes to document object creation and worker creation to store the state and changes to the environment settings object to expose it (similar to how we do it for other such state, such as referrer policy).
Change Fetch such that it copies a request's client's first-party origin on a new first-party origin field on the request object.
Change Fetch such that whenever it talks about "the HTTP cache" it instead gets an HTTP cache via the first-party origin somehow. I'm not sure how detailed we want to make this, but we probably want to at least have a couple of words describing what it means to get an HTTP cache for a particular origin. Perhaps this can borrow wording from how we deal with connections.

Hope that helps!

sleevi · 2019-07-16T16:50:08Z

@annevk I suspect we'd also need to update connection pools as well to extend that concept. Or were you thinking of doing it separately? I wasn't sure if #904 (comment) extended to those changes as well?

I ask, because I'm wondering if it makes sense if, similar to how "connection" is defined as an aggregate of both origin and credentials, it might make sense to define the concept as an aggregate function (which is made up of, for now, first-party origin, a singular value), which would then

Allow you to extend the definition if there are other attribute keys (e.g. TBB uses the TOR circuit ID as part of the key, IIRC)
Allow that concept to be reused between the "connection group" and "HTTP cache".

Incrementalism also works, I just wanted to make sure that was your goal.

annevk · 2019-07-16T17:06:46Z

@sleevi I'd like changes to connections to be a separate change, but it does make sense to me to iterate toward that. Would you mind opening an issue on that and elaborate a bit on the thinking behind it there? I understood there to be an issue with sites being able to reach the global connection limit, but that would not necessarily disappear with a first-party origin key on connection pools I think.

sleevi · 2019-07-16T17:22:24Z

Filed #917. My understanding is that UAs doing this partitioning are doing so for privacy reasons, and hopefully #917 explains why those privacy reasons aren't functionally achievable without a similar partitioning/keying of the connection pool.

shivanigithub · 2019-07-17T17:28:26Z

@annevk Thanks for the inputs. Looking into these.

shivanigithub · 2019-07-25T17:15:17Z

@annevk Regarding the spec inputs, am I correct in understanding that the proposed "first-party origin" field on the the environment settings object is an output field populated by the browser to indicate the key being used and not an input to the browser?

annevk · 2019-07-26T07:17:33Z

I'm not entirely sure what you mean, but yes, the browser (i.e., user agent) is responsible for setting it.

shivanigithub · 2019-08-09T17:11:36Z

Few clarifications for spec changes:
Chrome's experimenting with double keying (top-frame origin, url) as well as triple-keying (top-frame origin, frame origin, url). From a spec perspective how much in detail do we want to go in the contents of the partitioning key?
E.g. in the fetch spec, would something like this make sense:

[Already existing text]
A request has an associated cache mode, which is "default", "no-store", "reload", "no-cache", "force-cache", or "only-if-cached". Unless stated otherwise, it is "default".

[New text added following the above]
The user agent might partition the HTTP cache using a partitioning key and references to the HTTP cache in this spec refer to a partitioned cache, when applicable. If the cache is partitioned, it will be looked up using the request URL and the top-frame origin (and possibly also the frame origin) instead of just the request URL.

annevk · 2019-08-12T07:35:30Z

I think at the very least we should define top-level origin from first principles and use that as the key as all browsers plan to align on that. Allowing additional keys seems reasonable. I also think we should be more explicit and update the various lookup points to pass in the appropriate keys.

sleevi · 2019-09-26T07:44:44Z

My current thinking is that for caches we want something analogous to https://fetch.spec.whatwg.org/#connections with a way to obtain a cache given some keying material.

+1 to this. It seems that, from an infrastructure perspective, declaring that the client has multiple caches, similar to how we do for connection pools (which would presumably be extended in #917), would work.

That is, a given request has an associated HTTP cache derived from the primary key(s) (e.g. the top-frame origin or the top-frame-origin+initiator). When performing an http-network-or-cache-fetch, the cache object for the request is used to fetch the request URL.

This would allow the rest of the infrastructure to naturally work, by conceptually stating there are multiple caches (similar to how Service Workers do with the Cache object). An implementation would be able to implement this using a single logical disk store by double-keying/triple-keying, which should be indistinguishable from the spec.

shivanigithub · 2019-09-26T16:59:58Z

Thanks for the inputs! Makes sense to focus on updating the whatwg fetch spec for this change and the cache issue can take care of cleaning up the relevant cache key definitions in the IETF RFC.

shivanigithub · 2019-09-26T20:08:06Z

Created a pull request for the spec change: #943

shivanigithub · 2020-01-10T15:14:54Z

The html spec change to define top-level origin is in progress at: whatwg/html#4966

jkarlin · 2020-01-10T15:23:12Z

I wanted to loop back to the earlier discussion on origin vs eTLD+1 for partitioning the cache. We've come across sites where frames are significantly impacted by triple-keying with origin but not eTLD+1. As such, we intend to proceed with scheme+eTLD+1 instead of origin and (like site isolation) hope to transition to origin at a later point.

This will provide the foundation for whatwg/fetch#943 and other related changes discussed in whatwg/fetch#904.

annevk · 2020-03-24T14:29:53Z

#943 is almost ready to land. Now would be a good time to speak up if you need more time to review or some such. Otherwise it'll go in tomorrow or Thursday I suspect.

The initial version uses the top-level site (not origin after all, mind) as the additional HTTP cache key.

Tests: https://github.com/web-platform-tests/wpt/blob/master/fetch/http-cache/split-cache.tentative.html (TODO: rename, iframes?) See https://github.com/shivanigithub/http-cache-partitioning. Closes whatwg#904.

rugabunda · 2020-07-10T17:16:57Z

@annevk @mozfreddyb, for efficiency concerns, implement local CDN emulation; similar to this firefox/chrome addon, probably the best example which currently covers the highest number of CDNs out there at the moment: https://codeberg.org/nobody/LocalCDN

"A web browser extension that emulates Content Delivery Networks to improve your online privacy. It intercepts traffic, finds supported resources locally, and injects them into the environment."

re: https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/6KKXv1PqPZ0/discussion

"This month, July 2019, cdnjs served almost 190 billion requests ... Lodash (4.17.11) skyrocketed to the top of the list this month with 8.7 billion requests."[1]
I imagine the cache efficiency lost due to this change for this CDN alone (jQuery, lodash, etc) will be massive.
"Approximately 100% of the Fortune 500 already use npm to acquire approximately 97% of their JavaScript code." [2]
Pika is creating a CDN for modern npm packages that can run in the browser. The project is only a few months old today, but with ESM it becomes feasible for sites to load their npm dependencies from our CDN (or UNPKG, or another cross-origin CDN like it) in production. Basically, cdnjs for npm. In that world, every npm package would only be loaded once across all participating sites, and would then be cached and reused on future visits. Imagine if most sites never had to load React, ReactDOM, Preact, Vue, the 100 most popular npm packages, etc.

Obviously security is a huge concern, and I completely understand and appreciate the work being done here. But I'd want to make sure that an important performance story on the web isn't accidentally destroyed in the process.

If this proposal does continue to move forward, I'd at least want an opt-in proposal discussed, either via the existing Cache-Control header, a new header, or some other mechanism. I do not believe that either of the two concerns outlined above were reasonably serious: We're talking about a small number of CDN-related cookies, and in practice the "Detect if a user has visited a specific site" attack-surface would be negligible (and again, opt-in). I'm happy to contribute / get involved if time & effort is a blocker here.

Thanks again,

FKS

rugabunda · 2020-07-10T17:24:07Z

The next step would be automated local CDN emulation which is separated into its own cache using a detection mechanism that determines if the same resource has been accessed over multiple websites; similar to privacy badger, which "If an advertiser seems to be tracking you across multiple [three] websites without your permission, Privacy Badger automatically blocks that advertiser from loading any more content in your browser. "

With [double-keyed cache](whatwg/fetch#904) enabled in all main browsers, cached resources will not be shared across webistes.

* Remove third-party caching misconception With [double-keyed cache](whatwg/fetch#904) enabled in all main browsers, cached resources will not be shared across webistes. * Update content/analytics/web-analytics/understanding-web-analytics/data-origin-and-collection.md Co-authored-by: marciocloudflare <83226960+marciocloudflare@users.noreply.github.com>

annevk added security/privacy There are security or privacy implications needs tests Moving the issue forward requires someone to write tests topic: http needs concrete proposal Moving the issue forward requires someone to figure out a detailed plan labels May 9, 2019

mayhemer mentioned this issue May 23, 2019

2019-05-23 meeting notes mozilla-necko/meeting-notes#24

Closed

gterzian mentioned this issue Jun 3, 2019

Improve HTTP Cache servo/servo#23495

Open

sleevi mentioned this issue Jul 16, 2019

Double-keyed connection pools #917

Open

annevk added the topic: anti-tracking label Aug 7, 2019

shivanigithub mentioned this issue Sep 26, 2019

HTTP cache partitioning #943

Merged

yoavweiss mentioned this issue Nov 26, 2019

Server Timing can be used a persistent 3rd party identifier w3c/server-timing#67

Closed

demianrenzulli mentioned this issue Dec 9, 2019

Various problems with multi-origin-pwas page GoogleChrome/web.dev#1973

Closed

yoavweiss mentioned this issue Feb 10, 2020

Consider removing wildcard option of the Timing-Allow-Origin header to prevent browser history leakage w3c/resource-timing#222

Open

kdzwinel mentioned this issue Feb 12, 2020

Full storage partitioning / double-keying privacycg/proposals#4

Closed

domenic mentioned this issue Mar 3, 2020

supporting origin policy for fully offline use cases WICG/origin-policy#85

Open

domenic pushed a commit to whatwg/html that referenced this issue Mar 11, 2020

Define the top-level origin of an environment settings object

916a923

This will provide the foundation for whatwg/fetch#943 and other related changes discussed in whatwg/fetch#904.

annevk mentioned this issue Mar 12, 2020

Address issues around diversity of fonts on the web w3cping/font-anti-fingerprinting#3

Open

youennf mentioned this issue Apr 16, 2020

Restrict background-fetch to first party contexts WICG/background-fetch#144

Open

annevk removed needs concrete proposal Moving the issue forward requires someone to figure out a detailed plan needs tests Moving the issue forward requires someone to write tests labels May 5, 2020

mozfreddyb mentioned this issue Jun 2, 2020

Consider shared caching w3c/webappsec-subresource-integrity#22

Closed

annevk closed this as completed in 8b0ac51 Jul 3, 2020

gwarser mentioned this issue Jul 10, 2020

FYI: don't use dFPI with FPI [1631676: resolved 78] arkenfox/user.js#930

Closed

brettz9 mentioned this issue Jul 30, 2020

Support CDNs JonasKruckenberg/rollup-plugin-sri#3

Closed

dgrammatiko mentioned this issue Mar 14, 2021

[4.0] Proper cache invalidation of the static assets [.js/.css] joomla/joomla-cms#32485

Merged

rik added a commit to rik/cloudflare-docs that referenced this issue May 20, 2022

Remove third-party caching misconception

6c9d631

With [double-keyed cache](whatwg/fetch#904) enabled in all main browsers, cached resources will not be shared across webistes.

rik mentioned this issue May 20, 2022

Remove third-party caching misconception cloudflare/cloudflare-docs#4517

Merged

gterzian mentioned this issue Apr 12, 2024

RFC: Font system redesign servo/servo#32033

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Double-keyed HTTP cache #904

Double-keyed HTTP cache #904

Double-keyed HTTP cache #904

Double-keyed HTTP cache #904

Comments