8000 rkt: experimental support for pod sandbox by s-urbaniak · Pull Request #3318 · rkt/rkt · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Feb 24, 2020. It is now read-only.

rkt: experimental support for pod sandbox #3318

Merged
merged 35 commits into from
Nov 7, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
a22557a
stop: Don't treat 'rkt stop already-stopped-pods' as errors.
Sep 19, 2016
e15b710
Allow `rkt app rm` on a stopped pod.
Sep 19, 2016
4e4e662
rkt status: Add '--format=json' flag to print json format for pods.
Sep 20, 2016
f2c70f0
cri: Prepare isolators
krnowak Sep 16, 2016
fe396ff
stage1: mount cgroup knobs RW for new app
alban Sep 19, 2016
d48c521
stage1: add mount/umount in all flavors
alban Sep 20, 2016
3d1c6e1
cgroup: resolve merge conflict
Sep 21, 2016
c37e6b8
CRI: Allow the pod sandbox to accept port forwards
squeed Sep 21, 2016
70d5ed8
run: allow port forwards *from* a specific IP
squeed Sep 21, 2016
db21057
kvm/init: remove condition for kvm mutable pods
Sep 22, 2016
463fdbf
cri: don't remount cgroup knobs RW with cgroup2
alban Sep 21, 2016
6982a73
CRI: Add '--annotation' and '--label' flag for 'rkt app sandbox'.
Sep 19, 2016
348facd
CRI: Add '--name', '--annotation', '--label', '--environment' for 'rk…
Sep 16, 2016
04a4077
cri/sandbox: autodect mutable stage1 capabilities
Sep 23, 2016
1468df8
CRI: Add '--working-dir', '--supplementary-gids', '--readonly-rootfs'…
Sep 23, 2016
1ae5d19
common,tests: refactor GetExitStatus
iaguis Sep 21, 2016
178c338
stage0,stage1: handle exit status from stop entrypoint
iaguis Sep 21, 2016
c517764
CRI: add oom_score_adj isolator
squeed Sep 22, 2016
4969981
run/repare: Support mutating the app for 'rkt run/prepare' as well.
Sep 22, 2016
4d735db
Documentation: Update docs for rkt run/prepare.
Sep 23, 2016
62dd49d
app-start: set up unit files/cgroups during app-add
Sep 26, 2016
5eab251
app: remove code duplication for preparing the stage1 image
Sep 27, 2016
1babb4f
app-rm: introduce RmConfig
Sep 29, 2016
22517ac
Documentation: specify app subcommands synchronization
Oct 4, 2016
953282c
app: implement synchronization of pod mutation operations
Oct 4, 2016
5b3f4db
CRI: Support volume creation at app add time
squeed Oct 7, 2016
57036ca
CRI: pick up appc annotation rename
squeed Oct 13, 2016
b59fa4b
CRI: add cpu-shares isolator
Oct 12, 2016
93e178f
Merge remote-tracking branch 'origin/master' into crisync
Nov 1, 2016
4f5f81d
cri: add app subcommands
Nov 1, 2016
d071d34
stage0/app: mark app subcommands as hidden, gate behind app experiment.
Nov 1, 2016 8000
2f5de7f
lib: use nanoseconds for app state
Nov 3, 2016
2e9d62c
Merge remote-tracking branch 'origin/master' into crisync
Nov 4, 2016
7dd24cf
stage1: document experimental interface v5
lucab Nov 6, 2016
3fb6b55
Merge pull request #4 from lucab/to-sur/stage1-interface-v5-exp
Nov 7, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions Documentation/devel/pod-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,25 @@ To prevent the period between first creating a pod's directory and acquiring its
| ExitedGarbage | "$var/exited-garbage/$uuid" | exited+deleting | exited+gc-marked |
| Garbage | "$var/garbage/$uuid" | prepare-failed+deleting | prepare-failed+gc-marked |

## App

The `rkt app` family of subcommands allow mutating operations on a running pod, namely adding, starting, stopping, and removing applications.
The `rkt app sandbox` subcommand transitions to the Run phase as described above, whereas the remaining subcommands mutate the pod while staying in the Run phase.
To synchronize operations inside the Run phase an additional advisory lock `$var/run/$uuid/pod.lck` is being introduced.
Locking on the `$var/run/$uuid/pod` manifest won't work because changes on it need to be atomic, realized by overwriting the original manifest.
If this file is locked, the pod is undergoing a mutation. Note that only `rkt add/rm` operations are synchronized.
To retain consistency for all other operations (i.e. `rkt list`) that need to read the `$var/run/$uuid/pod` manifest all mutating operations are atomic.

The `app add/start/stop/rm` subcommands all run within the Run phase where the exclusive advisory lock on the `$var/run/$uuid` directory is held by the systemd-nspawn process.
The following table gives an overview of the states when a lock on `$var/run/$uuid/pod.lck` is being held:

| Phase | Locked exclusively | Unlocked |
|--------|--------------------|----------|
| Add | adding | added |
| Start | - | - |
| Stop | - | - |
| Remove | removing | removed |

These phases, their function, and how they proceed through their respective states is explained in more detail below.

## Embryo
Expand Down
121 changes: 118 additions & 3 deletions Documentation/devel/stage1-implementors-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ Any stage1 that supports and expects machined registration to occur will likely
* `--interactive` to run a pod interactively, that is, pass standard input to the application (only for pods with one application)
* `--local-config=$PATH` to override the local configuration directory
* `--private-users=$SHIFT` to define a UID/GID shift when using user namespaces. SHIFT is a two-value colon-separated parameter, the first value is the host UID to assign to the container and the second one is the number of host UIDs to assign.
* `--mutable` activates a mutable environment in stage1. If the stage1 image manifest has no `app` entrypoint annotations declared, this flag will be unset to retain backwards compatibility.

#### Arguments added in interface version 2

Expand All @@ -89,6 +90,12 @@ Any stage1 that supports and expects machined registration to occur will likely
`resolv.conf` is to create /etc/rkt-resolv.conf iff a CNI plugin specifies it, and for `hosts` is to create
a fallback if the app does not provide it.

#### Arguments added in interface version 5 (experimental)

This interface version is not yet finalized, thus marked as experimental.

* `--mutable` to run a mutable pod

### rkt enter

`coreos.com/rkt/stage1/enter`
Expand Down Expand Up @@ -138,13 +145,97 @@ In the bundled rkt stage 1, the entrypoint is sending SIGTERM signal to systemd-
* `--force` to force the stopping of the pod. E.g. in the bundled rkt stage 1, stop sends SIGKILL
* UUID of the pod

## Versioning
## Crossing Entrypoints

Some entrypoints need to perform actions in the context of stage1 or stage2. As such they need to cross stage boundaries (thus the name) and depend on the `enter` entrypoint existence. All crossing entrypoints receive additional options for entering via the following environmental flags:

* `RKT_STAGE1_ENTERCMD` specify the command to be called to enter a stage1 or a stage2 environment
* `RKT_STAGE1_ENTERPID` specify the PID of the stage1 to enter
* `RKT_STAGE1_ENTERAPP` optionally specify the application name of the stage2 to enter

### rkt app add

(Experimental, to be stabilized in version 5)

`coreos.com/rkt/stage1/app/add`

This is a crossing entrypoint.

#### Arguments

* `--app` application name
* `--debug` to activate debugging
* `--uuid` UUID of the pod
* `--disable-capabilities-restriction` gives all capabilities to apps (overrides `retain-set` and `remove-set`)
* `--disable-paths` disables inaccessible and read-only paths (such as `/proc/sysrq-trigger`)
* `--disable-seccomp` disables seccomp (overrides `retain-set` and `remove-set`)
* `--private-users=$SHIFT` to define a UID/GID shift when using user namespaces. SHIFT is a two-value colon-separated parameter, the first value is the host UID to assign to the container and the second one is the number of host UIDs to assign.

### rkt app start

(Experimental, to be stabilized in version 5)

`coreos.com/rkt/stage1/app/start`

This is a crossing entrypoint.

#### Arguments

* `--app` application name
* `--debug` to activate debugging

### rkt app stop

(Experimental, to be stabilized in version 5)

`coreos.com/rkt/stage1/app/stop`

This is a crossing entrypoint.

#### Arguments

* `--app` application name
* `--debug` to activate debugging

### rkt app rm

(Experimental, to be stabilized in version 5)

`coreos.com/rkt/stage1/app/rm`

This is a crossing entrypoint.

#### Arguments

* `--app` application name
* `--debug` to activate debugging

### rkt attach

(Experimental, to be stabilized in version 5)

`coreos.com/rkt/stage1/attach`

This is a crossing entrypoint.

#### Arguments

* `--action` action to perform (`auto-attach`, `custom-attach` or `list`)
* `--app` application name
* `--debug` to activate debugging
* `--tty-in` whether to attach TTY input (`true` or `false`)
* `--tty-out` whether to attach TTY output (`true` or `false`)
* `--stdin` whether to attach stdin (`true` or `false`)
* `--stdout` whether to attach stdout (`true` or `false`)
* `--stderr` whether to attach stderr (`true` or `false`)

## Stage1 Metadata

### Versioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version interface should also be bumped once we agree on entrypoints interface.


The stage1 command line interface is versioned using an annotation with the name `coreos.com/rkt/stage1/interface-version`.
If the annotation is not present, rkt assumes the version is 1.

The current version of the stage1 interface is 3.

## Examples

### Stage1 ACI manifest
Expand Down Expand Up @@ -193,6 +284,24 @@ The current version of the stage1 interface is 3.
}
```

## Runtime Metadata

Pods and applications can be annotated at runtime to signal support for specific features.

### Mutable pods (experimental v5)

Stage1 images can support mutable pod environments, where, once a pod has been started, applications can be added/started/stopped/removed while the actual pod is running. This information is persisted at runtime in the pod manifest using the `coreos.com/rkt/stage1/mutable` annotation.

If the annotation is not present, `false` is assumed.

### Attachable applications (experimental v5)

Stage1 images can support attachable applications, where I/O and TTY from each applications can be dynamically redirected and attached to.
In that case, this information is persisted at runtime in each application manifest using the following annotations:
- `coreos.com/rkt/stage2/stdin`
- `coreos.com/rkt/stage2/stdout`
- `coreos.com/rkt/stage2/stderr`

## Filesystem Layout Assumptions

The following paths are reserved for the stage1 image, and they will be created during stage0.
Expand Down Expand Up @@ -222,5 +331,11 @@ Later the exit status can be retrieved and shown by `rkt status $uuid`.
This directory path is used for passing environment variables to each app.
For example, environment variables for an app named `foo` will be stored in `rkt/env/foo`.

### iottymux (experimental v5)

`rkt/iottymux`

This directory path is used for TTY and streaming attach helper.
When attach mode is enabled each application will have a `rkt/iottymux/$appname/` directory, used by the I/O and TTY mux sidecar.

[rkt-networking]: ../networking/overview.md
4 changes: 4 additions & 0 deletions Documentation/subcommands/prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,16 @@ c9fad0e6-8236-4fc2-ad17-55d0a4c7d742

| Flag | Default | Options | Description |
| --- | --- | --- | --- |
| `--user-annotation` | none | annotation add to the app's UserAnnotations field | Set the app's annotations (example: '--annotation=foo=bar'). |
| `--caps-remove` | none | capability to remove (example: '--caps-remove=CAP\_SYS\_CHROOT,CAP\_MKNOD') | Capabilities to remove from the process's capabilities bounding set, all others from the default set will be included |
| `--caps-retain` | none | capability to retain (example: '--caps-remove=CAP\_SYS\_ADMIN,CAP\_NET\_ADMIN') | Capabilities to retain in the process's capabilities bounding set, all others will be removed |
| `--environment` | none | environment variables add to the app's environment variables | Set the app's environment variables (example: '--environment=foo=bar'). |
| `--exec` | none | Path to executable | Override the exec command for the preceding image. |
| `--group` | root | gid, groupname or file path | Group override for the preceding image (example: '--group=group') |
| `--inherit-env` | `false` | `true` or `false` | Inherit all environment variables not set by apps. |
| `--user-label` | none | label add to the apps' UserLabels field | Set the app's labels (example: '--label=foo=bar'). |
| `--mount` | none | Mount syntax (ex. `--mount volume=NAME,target=PATH`) | Mount point binding a volume to a path within an app. See [Mounting Volumes without Mount Points][vol-no-mount]. |
| `--name` | none | Name of the app | Set the name of the app (example: '--name=foo'). If not set, then the app name default to the image's name |
| `--no-overlay` | `false` | `true` or `false` | Disable the overlay filesystem. |
| `--no-store` | `false` | `true` or `false` | Fetch images, ignoring the local store. See [image fetching behavior][img-fetch] |
| `--pod-manifest` | none | A path | The path to the pod manifest. If it's non-empty, then only `--net`, `--no-overlay` and `--interactive` will have effect. |
Expand Down
46 changes: 41 additions & 5 deletions Documentation/subcommands/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,16 @@ Multiple applications can be run in a pod by passing multiple images to the run
# rkt run example.com/app1 example.com/app2
```

## Overriding the app's name

Be default, the image's name will be used as the app's name.
It can be overridden by rkt using the `--name` flag.
This comes handy when we want to run multiple apps using the same image:

```
# rkt --insecure-options=image run docker://busybox --name=busybox1 docker://busybox --name=busybox2
```

## Overriding Executable to launch

Application images include an `exec` field that specifies the executable to launch.
Expand Down Expand Up @@ -74,19 +84,34 @@ This can be combined with overridden executables:
# rkt run example.com/worker --exec /bin/ov -- --loglevel verbose --- example.com/syncer --exec /bin/syncer2 -- --interval 30s
```

## Adding user annotations and user labels

Additional annotations and labels can be added to the app by using `--user-annotation` and `--user-label` flag.
The annotations and labels will appear in the app's `UserAnnotations` and `UserLabels` field.

```
# rkt run example.com/example --user-annotation=foo=bar --user-label=hello=world
```

## Influencing Environment Variables

To inherit all environment variables from the parent use the `--inherit-env` flag.
To inherit all environment variables from the parent, use the `--inherit-env` flag.

To explicitly set environment variables for all apps, use the `--set-env` flag.

To explicitly set individual environment variables use the `--set-env` flag.
To explicitly set environment variables for all apps from a file, use the `--set-env-file` flag.
Variables are expected to be in the format `VAR_NAME=VALUE` separated by the new line character `\n`.
Lines starting with `#` or `;` and empty ones will be ignored.

To explicitly set environment variables for each app individually, use the `--environment` flag.

To explicitly set environment variables from a file use the `--set-env-file` flag. Variables are expected to be in the format `VAR_NAME=VALUE` separated by the new line character `\n`. Lines starting with `#` or `;` and empty ones will be ignored.
The precedence is as follows with the last item replacing previous environment entries:

- Parent environment
- App image environment
- Explicitly set environment variables from file (`--set-env-file`)
- Explicitly set environment variables on command line (`--set-env`)
- Explicitly set environment variables for all apps from file (`--set-env-file`)
- Explicitly set environment variables for all apps on command line (`--set-env`)
- Explicitly set environment variables for each app on command line (`--environment`)

```
# export EXAMPLE_ENV=hello
Expand All @@ -95,6 +120,13 @@ The precedence is as follows with the last item replacing previous environment e
EXAMPLE_ENV=hello
FOO=bar
EXAMPLE_OVERRIDE=over

# export EXAMPLE_ENV=hello
# export EXAMPLE_OVERRIDE=under
# rkt run --inherit-env --set-env=FOO=bar --set-env=EXAMPLE_OVERRIDE=over example.com/env-printer --environment=EXAMPLE_OVERRIDE=ride
EXAMPLE_ENV=hello
FOO=bar
EXAMPLE_OVERRIDE=ride
```

## Disable Signature Verification
Expand Down Expand Up @@ -355,20 +387,24 @@ This feature will be disabled automatically if the underlying filesystem does no

| Flag | Default | Options | Description |
| --- | --- | --- | --- |
| `--user-annotation` | none | annotation add to the app's UserAnnotations field | Set the app's annotations (example: '--user-annotation=foo=bar'). |
| `--caps-remove` | none | capability to remove (e.g. `--caps-remove=CAP_SYS_CHROOT,CAP_MKNOD`) | Capabilities to remove from the process's capabilities bounding set; all others from the default set will be included. |
| `--caps-retain` | none | capability to retain (e.g. `--caps-retain=CAP_SYS_ADMIN,CAP_NET_ADMIN`) | Capabilities to retain in the process's capabilities bounding set; all others will be removed. |
| `--cpu` | none | CPU units (e.g. `--cpu=500m`) | CPU limit for the preceding image in [Kubernetes resource model][k8s-resources] format. |
| `--dns` | none | IP Address | Name server to write in `/etc/resolv.conf`. It can be specified several times. |
| `--dns-opt` | none | DNS option | DNS option from resolv.conf(5) to write in `/etc/resolv.conf`. It can be specified several times. |
| `--dns-search` | none | Domain name | DNS search domain to write in `/etc/resolv.conf`. It can be specified several times. |
| `--environment` | none | environment variables add to the app's environment variables | Set the app's environment variables (example: '--environment=foo=bar'). |
| `--exec` | none | Path to executable | Override the exec command for the preceding image. |
| `--group` | root | gid, groupname or file path (e.g. `--group=core`) | Group override for the preceding image. |
| `--hostname` | `rkt-$PODUUID` | A host name | Set pod's host name. |
| `--inherit-env` | `false` | `true` or `false` | Inherit all environment variables not set by apps. |
| `--interactive` | `false` | `true` or `false` | Run pod interactively. If true, only one image may be supplied. |
| `--user-label` | none | label add to the apps' UserLabels field | Set the app's labels (example: '--user-label=foo=bar'). |
| `--mds-register` | `false` | `true` or `false` | Register pod with metadata service. It needs network connectivity to the host (`--net` as `default`, `default-restricted`, or `host`). |
| `--memory` | none | Memory units (e.g. `--memory=50M`) | Memory limit for the preceding image in [Kubernetes resource model][k8s-resources] format. |
| `--mount` | none | Mount syntax (e.g. `--mount volume=NAME,target=PATH`) | Mount point binding a volume to a path within an app. See [Mounting Volumes without Mount Points](#mounting-volumes-without-mount-points). |
| `--name` | none | Name of the app | Set the name of the app (example: '--name=foo'). If not set, then the app name default to the image's name |
| `--net` | `default` | A comma-separated list of networks. (e.g. `--net[=n[:args], ...]`) | Configure the pod's networking. Optionally, pass a list of user-configured networks to load and set arguments to pass to each network, respectively. |
| `--no-overlay` | `false` | `true` or `false` | Disable the overlay filesystem. |
| `--no-store` | `false` | `true` or `false` | Fetch images, ignoring the local store. See [image fetching behavior][img-fetch]. |
Expand Down
36 changes: 24 additions & 12 deletions common/apps/apps.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,27 @@ const (
)

type App struct {
Image string // the image reference as supplied by the user on the cli
ImType AppImageType // the type of the image reference (to be guessed, url, path or hash)
Args []string // any arguments the user supplied for this app
Asc string // signature file override for image verification (if fetching occurs)
Exec string // exec override for image
Mounts []schema.Mount // mounts for this app (superseding any mounts in rktApps.mounts of same MountPoint)
MemoryLimit *types.ResourceMemory // memory isolator override
CPULimit *types.ResourceCPU // cpu isolator override
User, Group string // user, group overrides
CapsRetain *types.LinuxCapabilitiesRetainSet // os/linux/capabilities-retain-set overrides
CapsRemove *types.LinuxCapabilitiesRevokeSet // os/linux/capabilities-remove-set overrides
SeccompFilter string // seccomp CLI overrides
Name string // the name of the app. If not set, the the image's name will be used.
Image string // the image reference as supplied by the user on the cli
ImType AppImageType // the type of the image reference (to be guessed, url, path or hash)
Args []string // any arguments the user supplied for this app
Asc string // signature file override for image verification (if fetching occurs)
Exec string // exec override for image
WorkingDir string // working directory override for image
ReadOnlyRootFS bool // read-only rootfs override.
Mounts []schema.Mount // mounts for this app (superseding any mounts in rktApps.mounts of same MountPoint)
MemoryLimit *types.ResourceMemory // memory isolator override
CPULimit *types.ResourceCPU // cpu isolator override
CPUShares *types.LinuxCPUShares // cpu-shares isolator override
User, Group string // user, group overrides
SupplementaryGIDs []int // supplementary gids override
CapsRetain *types.LinuxCapabilitiesRetainSet // os/linux/capabilities-retain-set overrides
CapsRemove *types.LinuxCapabilitiesRevokeSet // os/linux/capabilities-remove-set overrides
SeccompFilter string // seccomp CLI overrides
OOMScoreAdj *types.LinuxOOMScoreAdj // oom-score-adj isolator override
UserAnnotations map[string]string // the user annotations of the app.
UserLabels map[string]string // the user labels of the app.
Environments map[string]string // the environments of the app.

// TODO(jonboulle): These images are partially-populated hashes, this should be clarified.
ImageID types.Hash // resolved image identifier
Expand Down Expand Up @@ -127,6 +136,9 @@ func (al *Apps) Validate() error {

f := func(mnts []schema.Mount) error {
for _, m := range mnts {
if m.AppVolume != nil { // allow app-specific volumes
continue
}
if _, ok := vs[m.Volume]; !ok {
return fmt.Errorf("dangling mount point %q: volume %q not found", m.Path, m.Volume)
}
Expand Down
Loading
0