proposal: Fetchers refactor #2964

sgotti · 2016-07-22T07:11:34Z

This proposal, starts from the distribution concept in #2953, to provide a
fetchers refactors to adapt to the above concept.

Additionally it also provides a complete workflow and ideas for
transport level caching to fit different needs:

Distribution handlers
Transport plugin handler
Transport plugins
Transport level caching
Complete separation between image store and transport level caching

@dgonyeo I opened this trying to define my view starting will all the various requirements that came to my mind and trying find a common denominator. This is obviously my view and I'd like to hear your thoughts on it and see how it'll fit with your proposed "fetcher" plugins in #2830.

This proposal, starts from the distribution concept, to provide a fetchers refactors to adapt to the above concept. Additionally it also provides a complete workflow and ideas for transport level caching to fit different needs: * Distribution handlers * Transport plugin handler * Transport plugins * Transport level caching * Complete separation between image store and transport level caching

cgonyeo · 2016-07-22T22:02:56Z

After seeing this I'm pretty sure we're on the exact same page, I just may have not articulated it clearly enough and you seem to have done the opposite. Also by actually providing definitions for "distribution" and "transport" I can better see the separation you want. :)

If you ignore the docker portion of my plugins proposal, what I've implemented is pretty close to this, it'll just need some refactoring to have a clearer distinction between the distribution handlers and the transport handlers. When the fetch function gets called it'll either A: directly hand it off to the transport handler (which uses plugins to perform the actual network requests) or B: hand it do a distribution handler (only AppC discovery atm) which will do some distribution logic to produce concrete URLs (AppC discovery) and then use the transport handlers to fetch them.

A big obvious difference between this and my current PR is that my PR is geared for the old store, so I'll need to do some work to make my suggestion adhere to this.

Also a question: transport handlers cover things like https:// resources. If I provide rkt a URL correlating to a transport handler, are the distribution handlers still used?

For what it's worth, my answer in my fetching proposal is that the distribution handlers aren't called in this case.

cgonyeo · 2016-07-22T22:07:06Z

Oh nevermind that question, just reread and noticed the archive distribution. As an addition to your comment on how the AppC distribution will internally use the archive distribution, I expect it to be the same if we implement an OCI distribution, since it could use the logic in the archive distribution to fetch the blobs over https it needs.

cgonyeo · 2016-07-22T22:08:16Z

Does this mean that the archive distribution is the only distribution that ever calls out to the transport handlers?

cgonyeo · 2016-07-22T22:09:26Z

Documentation/proposal/fetchers_refactor.md

+
+## Distribution Handlers plugins
+
+With this approach the distributions handler will live in the stage0. If there's the need to make them plugins the main tricky points is that they need to talk with an image store, so it'll be difficult to code them in a different language than go.


What I wanted in my plugins proposal was to enable support for alternate transports, so I'm fine keeping this logic inside the stage0.

cgonyeo · 2016-07-22T22:25:08Z

Aaaaand nevermind that question too. Now after reading more I see that the docker/OCI distribution may call them directly.

At first my opinion was that distributions should always call the transport handler directly, as otherwise things seemed a little murkier to me (for AppC, which distribution would do the image import?) but after dwelling on it a bit I think the way you outlined it here is best. After all the AppC distribution is the exact same as the archive distribution, just with 1-2 additional steps it needs to do first (perform discovery, maybe fetch/trust public keys).

I'd be happy to start refactoring my plugins PR to align with this, once the others here have a little more time for feedback on this.

lucab · 2016-07-22T23:14:07Z

GC is the part of this proposal that I like the less. I'm not sure if I'm missing any details, but it seems avoidable.

I guess we can augment current CAS with cacheManifest linked to an already downloaded blob, containing all distribution entries it has been fetched/discovered from; on removal, the cacheManifest is inspected and all cache entries evicted. This is already backward compatible, as a blob in CAS without a cacheManifest has no to cache entries to be purged.

Am I missing some caveats?

sgotti · 2016-07-25T08:43:24Z

@lucab I see that in the case of transport caching GC could be overkill. I'll try to explain my thoughts:

The main intent was to keep the different layers (in this case image store and transports) decoupled (also for reusability in other projexts).The image store should only save images related information.
to avoid transport cache GC we can just use transport cache to know the blob digest. An upper layer (image rm) should just clean transport entries when a blob is removed. But this won't work if the image store does blob GC as I was thinking (see below).
- Since this isn't atomic, if something fails (error, process killed etc...) we are left with stale entries that will require a GC function to clean them.

Anyway I don't think cache transport gc is a big issue, it should also just remove entries older than N daysbwithout reading cache data, blob digest etc...

BTW there're other places were a GC or a cleanup will be useful/required (to avoid complex logics). I will open an issue on this to clarify this and find how to expose/run GC functions.

sgotti · 2016-07-25T11:38:56Z

As an addition to your comment on how the AppC distribution will internally use the archive distribution, I expect it to be the same if we implement an OCI distribution, since it could use the logic in the archive distribution to fetch the blobs over https it needs.

The oci-image-layout states that this layout should be a basis for distributing OCI images via an archive, http or other ways. Thinking in distribution terms I tend too see different distribution types:

oci image layout archive: will fetch a single file via any transport (file, http etc...), then the distribution handler will read the archive, take the ref and use the descriptor to fetch the required blobs from the archive.
oci image layout via an http server: (I'm just imagining since nothing has been defined) an http server will expose a directory layout like the oci image layout and the distribution will call the transport handler multiple times to fetch the ref and the blobs.

The docker registry is not related to oci image layout but is just another distribution type that can be used to fetch oci images.

Thoughts?

A big obvious difference between this and my current PR is that my PR is geared for the old store, so I'll need to do some work to make my suggestion adhere to this.

I think that, if the distribution idea is considered good and will pass, this work on the fetchers refactor can go ahead without having to wait for the new image store.

The new image store changes will just change the distribution handler parts that imports images in the store. Instead, having this just in place will ease the creation of new oci image layout archive and docker registry distribution that are needed to really start using the store with OCI images.

sgotti changed the title ~~proposal: Fetcher refactors~~ proposal: Fetchers refactor Jul 22, 2016

sgotti force-pushed the fetchers_refactor branch from 6d45a2e to 239fdff Compare July 22, 2016 07:12

sgotti mentioned this pull request Jul 22, 2016

fetcher: httpcaching fixes #2965

Merged

cgonyeo reviewed Jul 22, 2016
View reviewed changes

cgonyeo mentioned this pull request Jul 22, 2016

docs/dev: introduce "Distribution Point" concept #2953

Merged

This was referenced Aug 15, 2016

[RFC][WIP] Move to a reference(distribution) based image access and related CAS store. #3071

Open

rkt: Implement FetchImages and RemoveImages API through API service. #3073

Closed

This was referenced Aug 22, 2016

rkt: Implement distribution points #3101

Closed

Support for OCI runtime and image specification. #2541

Open

s-urbaniak added the area/oci image label Nov 10, 2016

jonboulle mentioned this pull request Nov 30, 2016

Documentation/proposals: add OCI Image Format roadmap #3425

Merged

iaguis mentioned this pull request Jan 16, 2018

implementation for storing native oci #3880

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: Fetchers refactor #2964

proposal: Fetchers refactor #2964

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		## Distribution Handlers plugins

		With this approach the distributions handler will live in the stage0. If there's the need to make them plugins the main tricky points is that they need to talk with an image store, so it'll be difficult to code them in a different language than go.

proposal: Fetchers refactor #2964

Are you sure you want to change the base?

proposal: Fetchers refactor #2964

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!