10000 Put architecture.md back into distribution repo by mdlinville · Pull Request #2282 · distribution/distribution · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Put architecture.md back into distribution repo #2282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
published: false
---

# Architecture

## Design
**TODO(stevvooe):** Discuss the architecture of the registry, internally and externally, in a few different deployment scenarios.

### Eventual Consistency

> **NOTE:** This section belongs somewhere, perhaps in a design document. We
> are leaving this here so the information is not lost.

Running the registry on eventually consistent backends has been part of the
design from the beginning. This section covers some of the approaches to
dealing with this reality.

There are a few classes of issues that we need to worry about when
implementing something on top of the storage drivers:

1. Read-After-Write consistency (see this [article on
s3](http://shlomoswidler.com/2009/12/read-after-write-consistency-in-amazon.html)).
2. [Write-Write Conflicts](http://en.wikipedia.org/wiki/Write%E2%80%93write_conflict).

In reality, the registry must worry about these kinds of errors when doing the
following:

1. Accepting data into a temporary upload file may not have latest data block
yet (read-after-write).
2. Moving uploaded data into its blob location (write-write race).
3. Modifying the "current" manifest for given tag (write-write race).
4. A whole slew of operations around deletes (read-after-write, delete-write
races, garbage collection, etc.).

The backend path layout employs a few techniques to avoid these problems:

1. Large writes are done to private upload directories. This alleviates most
of the corruption potential under multiple writers by avoiding multiple
writers.
2. Constraints in storage driver implementations, such as support for writing
after the end of a file to extend it.
3. Digest verification to avoid data corruption.
4. Manifest files are stored by digest and cannot change.
5. All other non-content files (links, hashes, etc.) are written as an atomic
unit. Anything that requires additions and deletions is broken out into
separate "files". Last writer still wins.

Unfortunately, one must play this game when trying to build something like
this on top of eventually consistent storage systems. If we run into serious
problems, we can wrap the storagedrivers in a shared consistency layer but
that would increase complexity and hinder registry cluster performance.
0