Writable API proposal #2879

yifan-gu · 2016-07-07T02:57:06Z

Goals

To remove all rkt shell-outs in kubernetes.
Much cleaner integration points, which can make kubelet_wrapper easier to run rktnetes.
Make it easier for performance monitoring.

Non-goals

Implement container runtime level interface

Design

First, take a look at the rkt shell-outs we currently have:

rkt fetch
rkt prepare --pod-manifest
rkt rm
rkt image rm
rkt enter

These are the commands we want in short term, other APIs we probably want to have are:
rkt gc
rkt run
rkt trust

Besides, since we would want systemd to parent the rkt process (by rkt run-prepared today), so rkt run-prepared can be left out in the first pass as well.
Also for rkt prepare, we could just implement the --pod-manifest for now.

Here is a draft for the protobuf definition we will add.

// ImageType defines the supported image types.
enum ImageType {
    IMAGE_TYPE_UNDEFINED = 0;
    IMAGE_TYPE_APPC      = 1;
    IMAGE_TYPE_DOCKER    = 2;
    IMAGE_TYPE_OCI       = 3;
}


enum InsecureOption {
    INSECURE_OPTION_NONE = 0;
    INSECURE_OPTION_ALL = 1;
    INSECURE_OPTION_IMAGE = 2;
    INSECURE_OPTION_TLS = 4;
    INSECURE_OPTION_ON_DISK = 8;
    INSECURE_OPTION_HTTP = 16;
    INSECURE_OPTION_PUBKEY = 32;
}

enum StoreOption {
    STORE_OPTION_NONE = 0;
    STORE_OPTION_NO_STORE = 1;
    STORE_OPTION_ONLY_STORE = 2;
}

message Config {
    // Data dir.
    string data_dir = 1;

    // Auth credentials to use.
    bytes auth_credentials = 2; // Use TLS for the connection even it’s locally?
}

message FetchImageRequest {
    // Type of the image, e.g. aci, docker, oci, etc.
    ImageType type = 1;

    // Resolvable Name of the image, used for discovery, can be string, hash, URL, file path.
    string name = 2;

    // Insecure options for fetch, it’s a bitflag.
    InsecureOption insecure_option = 3;

    // Whether to fetch from the store, or skip the store.
    StoreOption store_option = 4;

    // Whether to fetch the depedencies
    bool WithDeps = 5;

    // Config to use for the command, including docker auth, stage1s, dirs.
    // If not empty, this will override the config used by API service.
    Config config = 3;
}

message FetchImageResponse {
    // ID of the fetched image, which can be used to uniquely identify the image.
    string id = 1;
}

message PreparePodRequest {
    // The pod manifest to use for preparing.
    bytes pod_manifest = 1;

    // Config to use for the command, including docker auth, stage1s, dirs.
    // If not empty, this will override the config used by API service.
    Config config = 2;
}

message PreparePodResponse {
    // ID of the prepared rkt pod, which can be used to invoke `rkt run-prepared`.
    string id = 1;
}

message RemovePodRequest {
    // ID of the pod to remove, when specified, at most one image will be removed.
    string id = 1;

    // Config to use for the command, including docker auth, stage1s, dirs.
    // If not empty, this will override the config used by API service.
    Config config = 2;
 }

message RemovePodResponse {}

message RemoveImageRequest {
    // ID of the image to remove, when specified, at most one image will be removed.
    string id = 1;

    // Name of the image to remove, if multiple images have the same name, 
    // all of them will be removed.
    string name = 2;

    // Config to use for the command, including docker auth, stage1s, dirs.
    // If not empty, this will override the config used by API service.
    Config config = 3;
}

message RemoveImageResponse {}

message EnterPodRequest {
    // ID of the pod to enter
    string id = 1;

    // Name of the app inside the pod to enter. 
    // This must be non-empty if there are multiple apps in the pod.
    string app_name = 2;

    // The command to run for enter, optional.
    repeated string command = 3;

    // The stdin to for the enter, optional
    // string stdin = 4;

    // Config to use for the command, including docker auth, stage1s, dirs.
    // If not empty, this will override the config used by API service.
    Config config = 5;
}

message EnterPodReponse {
    // The stdout to for the enter, optional
    string stdout = 1;

    // The stderr to for the enter, optional
    string stderr = 2;
}


service PublicAPI {
    // old APIs.
    …
    rpc FetchImage (FetchImageRequest) returns (FetchImageResponse) {}
    rpc PreparePod (PreparePodRequest) returns (PreparePodResponse) {}
    rpc RemovePod (RemovePodRequest) returns (RemovePodResponse) {}
    rpc RemoveImage (RemoveImageRequest) returns (RemoveImageResponse) {}
    rpc EnterPod (EnterPodRequest) returns (stream EnterPodResponse) {} // This returns a bi-directional stream, where client can send stdin and server can send stdout/stderr.
}

Consideration

Writable API will require rkt api service to run as root for those operations.
Since we are moving towards container level APIs (), in the end of the day, it seems that these API calls will be dropped in future releases.
I am not confident we will have container level interface for 1.4, so it’s still worth for us to implement the write API, especially the benefits it will bring is visible (see goals above).

Plan

Implement the currently being used rkt shell-outs.
Cleanup rktnetes’s shell-outs
Implement any other necessary writable APIs.

cc @coreos/rkt-maintainers @coreos/rkt-engineering

The text was updated successfully, but these errors were encountered:

chancez · 2016-07-07T18:30:34Z

Something like this would make implementing something like docker for mac or boot2docker but for rkt much easier!

euank · 2016-07-07T20:55:47Z

I'm not sure that a global config object should accompany each request.

For this like auth for fetch and stage1 for prepare, it does make sense to provide those optoins, but I don't think they have to be part of the same larger config struct.

Some options, like data dir, I don't think there's really a need to vary per api-service instance.

It also might be nice to have some discussion about how the service will behave with respect to privileges; could I run it with user privileges and a data-dir I own, and then only be able to fetch, but not run?

sgotti · 2016-07-07T21:01:28Z

// Type of the image, e.g. aci, docker, oci, etc.
ImageType type = 1;

This will probably deserve its own issue but I'd like to use this as a starting point.

I think this will actually map to the current rkt command line in this way:

aci: accept an image name string, aci hash, URL, file
docker: use docker:// scheme with docker image string
oci: same as the above

I think that in future the current semantic will change (how to handle backward compatibility?), some examples:

docker://: currently this means use docker2aci to convert a v1, v2.1, v2.2 and also oci (lib/repository2: add support for docker v2.2 and OCI appc/docker2aci#176) manifest and related layers (and configs in v2.2 and oci) to a squashed aci. So docker:// covers two things together: image format and distribution (docker registry)

In future the idea will be to directly fetch oci refs and blobs (and docker v2.1 and v2.2 formats can be converted to this) in a dedicated oci local store (that's is quite different than the current aci store given the differences between the two specs), render to an oci runtime bundle and execute (or as a first step convert it to a rendered aci converting the oci config to an aci manifest) (#2541), this won't probably work with v1 registries (so docker2aci will be needed for these) since they aren't content addressable but uses imageID for every layer.

The main question is that oci image spec doesn't cover distribution but only image format since in future there may be different kind of distributions (not only the docker registry) for oci images.

I'm not sure if using specific shemes (docker:// oci://) covering both image type and distribution concepts is correct or there's the need for two distinct options (or a scheme like format+distribution:// like oci+registry://)

This choice will reflect on the rkt command line and this API.
Additionally, probably the k8s image spec definition (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/container/runtime.go#L45 now just a string) looks like the point where a similar logic should be implemented.

Since we are moving towards container level APIs (), in the end of the day, it seems that these API calls will be dropped in future releases.

That's a good point. I can't say if the required work is worth the effort but at least it can be useful to clarify some questions like the above.

lucab · 2016-07-08T09:19:10Z

Writable API will require rkt api service to run as root for those operations.

This looks like a major design point, departing quite a bit from the current status quo.

Where do you plan to expose this interface? I see a comment related to TLS, so I guess some kind of network socket; if so how do you plan to do authn/authz there? And how should we map it to the current POSIX permissions model we are using?

jonboulle · 2016-07-08T10:11:39Z

Since we are moving towards container level APIs (), in the end of the day, it seems that these API calls will be dropped in future releases.

This really troubles me - I don't understand the win here. We shouldn't design and implement an API knowing it's imminently going to be deprecated.

To remove all rkt shell-outs in kubernetes.

Is there evidence that this is causing issues today?

Much cleaner integration points, which can make kubelet_wrapper easier to run rktnetes.

Can you explain how this would simplify kubelet_wrapper?

Make it easier for performance monitoring.

How does a writeable API help with this?

yifan-gu · 2016-07-08T21:12:32Z

This really troubles me - I don't understand the win here. We shouldn't design and implement an API knowing it's imminently going to be deprecated.

I agree, the argument is those APIs can be usable by other systems as well (true?), although in this proposal, I suggest we implement the onces used by k8s first.

Is there evidence that this is causing issues today?

See below.
Also as I mentioned, it causes profiling harder IMO.

Can you explain how this would simplify kubelet_wrapper?

See #2878

Can you explain how this would simplify kubelet_wrapper?

If all the commands go through API service, then we can just profile the API service to get the CPU / memory metrics. Otherwise, say 100 pods is launched, then it will at least generate 100 profiles.

yifan-gu · 2016-07-08T21:27:46Z

Where do you plan to expose this interface? I see a comment related to TLS, so I guess some kind of network socket; if so how do you plan to do authn/authz there? And how should we map it to the current POSIX permissions model we are using?

This is a good concern, we will need to face it anyway when a writable API (either pod level or container level) is going to be implemented.

On the realistic side, the kubelet is running as root today anyway, so in practice we don't lose much, and I am not sure we should dig into the authn/authz rabbit hole at this point?

yifan-gu · 2016-07-08T21:30:59Z

@sgotti Thanks for the input! So my understand of your concern is that only ImageType is not sufficient, right? I can replace that with ImageFormat which is another protobuf message itself, and we can add things we need later, like distribution, etc.

sgotti · 2016-07-11T14:42:50Z

@sgotti Thanks for the input! So my understand of your concern is that only ImageType is not sufficient, right? I can replace that with ImageFormat which is another protobuf message itself, and we can add things we need later, like distribution, etc.

Yeah, basically I'm not sure which is the best way to define what image format/type and distribution to use, so I'm not sure if just a string (with specific scheme) will be enough for both. Will open a new issue on this.

IMHO ImageFormat sound strange to define both format and distribution but I'm not sure about a better name.

yifan-gu · 2016-07-11T17:37:14Z

@sgotti Or ImageSpec. But that will break current API.

euank · 2016-07-11T18:19:09Z

Since we are moving towards container level APIs (), in the end of the day, it seems that these API calls will be dropped in future releases.

If we expect this API to be used by anything other than rktnetes, we shouldn't add and deprecate things based on k8s needs.

However, if we expect this API to be a rktnetes specific api, we should develop it in a separate repository (probably a k8s one) and feel free to do whatever adding/removing as is needed in the k8s code, and tie it closely to k8s releases.

Are we trying to make:

A general purpose api for rkt (usable by arbitrary other projects as well; rktnetes as the first user)
An api explicitly to support rktnetes

Based on the goals, it looks more like 2, and it's possible this should live as part of the k8s project instead.
If it's instead 1, that's fine as well, but we should be clear about that.

I'd personally prefer to do 1 if we can, though 1 I think also implies we need to consider what this API looks like a little bit more.

Opinions?

alban · 2016-07-20T14:18:10Z

This needs more discussions.
@euank @yifan-gu @tmrts : could you reassess the milestone?

philips · 2016-07-20T17:36:34Z

For more context: The goal of having this API is to implement the Kubernetes CRI. Essentially the Kubelet wants a way of separating itself from the details of the container fetching and execution and instead focus on a handful of things like: log curation, pod cgroup setup, managing volumes, and managing networking (maybe). This is in-line with the overall goal of making Kubernetes less of a monolith and easier to hack on.

Some alternative ways of tackling this that could be considered:

Creating some sort of rkt library for the Kubelet to call
Breaking rkt into smaller pieces that could map to the CRI

Overall, rkt is doing two things:

It is working very well as the system level container engine for applications like the kubelet itself, docker engine, etc
It is working as a Kubernetes container engine but Kubernetes node team is trying to improve parts of the abstraction and add new features like init containers, dynamic pods, etc which necessitates this refactor.

Hope that helps set the stage.

yifan-gu · 2016-07-21T22:39:39Z

@philips Sorry for not being clear. The writable API proposed here (which is still on pod level that aligns with the rkt cli) is not targeted for the k8s CRI (operated on container level).
And this is part of the reason we are arguing whether we should put effort implement and maintain it, given that we are moving towards container level APIs.
The other reason is the rpc style writable API seems to break the POSIX permission model as @lucab mentioned #2879 (comment) , but we maybe could solve this by using unix socket.

The benefits for this pod level API would be:

Standardized integration point (everything goes through gRPC)
- Easier for testing.
- Easier for running rkt in rkt fly.
Make other people's life easier if they want to integrate rkt.

yifan-gu · 2016-07-21T23:03:10Z

IMHO ImageFormat sound strange to define both format and distribution but I'm not sure about a better name.

Can we add some field in the FetchImageRequest later for describing the discovery scheme? @sgotti

lucab · 2016-07-22T00:49:06Z

@yifan-gu regarding format/distribution, there are some sketches for a unified scheme at #2953

yifan-gu added component/api server area/api labels Jul 7, 2016

yifan-gu added this to the v1.11.0 milestone Jul 7, 2016

yifan-gu assigned euank, yifan-gu and tmrts Jul 7, 2016

yifan-gu mentioned this issue Jul 7, 2016

kubelet-wrapper / rkt gc: 'device or resource busy' #2878

Closed

alban removed this from the v1.11.0 milestone Jul 20, 2016

yifan-gu mentioned this issue Jul 22, 2016

rkt: Add write-only API definition/stubs. #2963

Closed

lucab unassigned euank and tmrts Apr 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Writable API proposal #2879

Writable API proposal #2879

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Writable API proposal #2879

Writable API proposal #2879

Comments

Uh oh!

Goals

Non-goals

Design

Consideration

Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!