8000 Update to Linux 4.14.46 by tianon · Pull Request #1322 · boot2docker/boot2docker · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
This repository was archived by the owner on Jan 1, 2021. It is now read-only.

Update to Linux 4.14.46 #1322

Closed
wants to merge 1 commit into from
Closed

Conversation

tianon
Copy link
Contributor
@tianon tianon commented May 30, 2018

The latest 4.9.x release doesn't patch cleanly with AUFS and 4.14.x is currently better supported there (http://aufs.sourceforge.net/).

cc @legal90 @frapposelli @phusl -- anything that this might cause issues with that you're aware of?

@tianon
Copy link
Contributor Author
tianon commented May 30, 2018

I've done a rough smoke test by building this and running it in QEMU, but that doesn't test anything to do with VMware, Parallels, Xen, etc. 😅

@legal90
8000
Copy link
Contributor
legal90 commented Jun 1, 2018

Hi @tianon ,
Thank you for notifying!
I've built boot2docker.iso from this branch (revision 2f2049f) and tested it with parallels driver. Unfortunately, I run into the exact same issue as we have seen before: Parallels/docker-machine-parallels#72 (comment)

To recap: by some reason the Docker daemon starts there with a big delay, so provisioning step times out. That happens only on the VM restart, but not on first time when VM is created and started.

$ docker-machine restart test-vm-parallels
Restarting "test-vm-parallels"...
(test-vm-parallels) Waiting for VM to start...
Waiting for SSH to be available...
Detecting the provisioner...
Unable to verify the Docker daemon is listening: Maximum number of retries (10) exceeded

This doesn't happen with the current upstream version of boot2docker - v18.05.0-ce

@romankulikov, do you know - could it be caused by some compatibility issue between current version of Parallels Tools(13.3.0-43321) and Linux kernel v4.14.46 ?

The boot2docker ISO which I built from this branch and used in my tests, could be downloaded here: https://www.dropbox.com/s/24txhj8e47ys9rt/boot2docker_4-14-46.iso

@tianon
Copy link
Contributor Author
tianon commented Jun 8, 2018

Friendly ping @romankulikov ❤️

@romankulikov
Copy link
Contributor

Hi there!

As to my understanding timeouts are caused the hang of dockerd process inside virtual machine. And this daemon hangs on syscall_318 which is... getrandom(2), right? And here I'm not sure about decrypting syscall arguments but it seems that it reads from /dev/random. The thing is that entropy pool is pretty empty at the moment of dockerd start which makes process to block. And it takes something like half a minute in my case to achieve needed entropy level in the kernel to return result back from getrandom(2). Of course, if you want to make things go faster: open virtual machine window, enter focus into VM and move the mouse – this should generate enough of interrupts to fill up entropy pool :-]).

Is said above not a complete nonsense? If it isn't we need to think of doing something from the virtualisation side to make the guest kernel gather entropy faster and/or modify boot2docker start up procedure not to eat entropy so fast.

@tianon
Copy link
Contributor Author

Does prltoolsd have a way to feed entropy to the kernel? (similar to QEMU's virtio-rng-pci)

@romankulikov
Copy link
Contributor
romankulikov commented Jun 11, 2018

No, Parallels Desktop doesn't virtualise any RNG devices at the moment.

@legal90
Copy link
Contributor
legal90 commented Jun 12, 2018

@romankulikov Thanks for your input! Now it's clear what's actually happening there.
I'm still wondering why does it happen only with certain versions of Linux kernel and the problem doesn't exist on Virtualbox and VMware Fusion/Workstation (maybe their tools daemons generate enough entropy by their own?)

But anyway, I just to notice: there is a lightweight daemon for generating an entropy: haveged. It should be easy to build it from sources.
@tianon I predict that it sounds weird, but what do you think about adding it to boot2docker iso? I think that might also make sense for other entropy-related stuff (such as generating certificates).

@romankulikov
Copy link
Contributor

I'm still wondering why does it happen only with certain versions of Linux kernel

I doubt that only on certain versions. This should be accidental and may occur on every kernel version.

and the problem doesn't exist on Virtualbox an 8000 d VMware Fusion/Workstation (maybe their tools
daemons generate enough entropy by their own?)

I don't know weather there's something related to RNG in their tools but different virtualisation engines may produce, for example, more virtual hardware interrupts during guest OS boot. This may result in more gathered entropy in the guest kernel.

But anyway, I just to notice: there is a lightweight daemon for generating an entropy: haveged. It
should be easy to build it from sources.

I guess haveged will not work: it is based on rdtsc instruction which is specifically virtualised in Parallels Desktop and this cannot be disabled.

@romankulikov
Copy link
Contributor
romankulikov commented Jun 12, 2018

I've made a test with using rngd from rng-tools as a tool to "inject" entropy into the kernel. Works pretty well. I've used /dev/urandom as a source for rgnd as the simplest solution. But from a security point of view it's better to pass a bunch of bytes from host's /dev/random, of course.

romankulikov added a commit to romankulikov/boot2docker that referenced this pull request Jun 12, 2018
When running boot2docker on Parallels Desktop for Mac dockerd hangs
during system start-up in `getrandom(2)` syscall because there's no much
entropy in `/dev/random` at that moment.

This commit is a dirty "proof of concept" fix by utilizing `rngd` from
`rng-tools` suite to add entropy in `/dev/random` from `/dev/urandom`.
Final solution surely should use better source of randomness. For
example a bunch of bytes from host's `/dev/random`.

In scope of boot2docker#1322.
@kaosagnt
Copy link

Kernel change https://lkml.org/lkml/2018/4/12/711

The crng_init variable has three states:

0: The CRNG is not initialized at all
1: The CRNG has a small amount of entropy, hopefully good enough for
early-boot, non-cryptographical use cases
2: The CRNG is fully initialized and we are sure it is safe for
cryptographic use cases.

The crng_ready() function should only return true once we are in the
last state.


Just Everything requiring crypto and strong entropy during boot.

Behaviour exhibited running on Virtualbox and VMWare Fusion also.

The addition of rng-tools fixes this.

The 4.9.x branch was changed here as well (4.4.96)
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.9.y&id=4dfb3442bb7e1fb80515df4a199ca5a7a8edf900

So all Linux kernels after April 12th 2018 once back ported will exhibit the behaviour.

@tianon
Copy link
Contributor Author
tianon commented Jul 30, 2018

Excellent detective work! 👍 ❤️

That's a real shame -- is there a reliable way we can determine whether a system has enough entropy by itself (like QEMU's virtio-rng-pci device) so that we don't use an entropy-generator on systems where it isn't necessary (because it seems like kind of a bad idea to use generally)?

@kaosagnt
Copy link
kaosagnt commented Jul 31, 2018

Some interesting discussion:

https://news.ycombinator.com/item?id=16972827
https://fedoraproject.org/wiki/Common_F28_bugs#Boot_process_is_very_slow_or_appears_to_hang_with_kernel_4.16.4_onwards

Built a new image with this change:

ianm-centos7:~/development/docker/images/iso/boot2docker>cat rootfs/rootfs/opt/gen-entropy.sh 
#!/bin/sh

COUNTER=0
while [  ${COUNTER} -lt 1000 ]; do
	find / -name "*.sh" -xdev > /dev/null 2>&1
	let COUNTER=COUNTER+1
done
ianm-centos7:~/development/docker/images/iso/boot2docker>head rootfs/rootfs/opt/bootscript.sh 
#!/bin/sh

# Try and generate entropy...crude...
/opt/gen-entropy.sh &

So generating activity while the boot process is happening seems to work. The VM didn't block as long when Docker started and no docker-machine timeout "detecting the provisioner". Only lightly tested under Virtualbox 5.2.16 and I'm running Kernel 4.14.55-boot2docker, Boot2Docker version 18.06.0-ce, DockerToolbox-18.06.0-ce on windows.

As far as I can tell other hypervisors don't have the equivalent of a virtio-rng-pci device. I'll keep digging.

@tianon
Copy link
Contributor Author
tianon commented Jul 31, 2018

As far as I can tell other hypervisors don't have the equivalent of a virtio-rng-pci device. I'll keep digging.

Interesting! So perhaps it would be sufficient to check for that one device, and launch an entropy generator otherwise? Or perhaps there's a particular level of entropy we could check for and assume that if it's high enough at the time of our check that starting up a generator is unnecessary?

Perhaps some of our other Hypervisor liaisons could chime in regarding entropy? 😇
cc @frapposelli @phusl

@kaosagnt
Copy link
kaosagnt commented Aug 1, 2018

From what I can tell from reading various articles, if we see /dev/hwrng we launch rngd from rng-tools and use it to feed entropy from the /dev/hwrng device. Current kernel config options i'm using.

CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_HW_RANDOM_VIA=m
CONFIG_HW_RANDOM_VIRTIO=m

So whether it's Virtio or and intel / AMD / Via HW random device they all show up as /dev/hwrng

rngd --list shows this under Virtualbox windows 7 host Intel CPU

Entropy sources that are available but disabled
4: NIST Network Entropy Beacon
Available and enabled entropy sources:
2: Intel RDRAND Instruction RNG

and we have a
root@kernel-414:/home/docker# ls -al /dev/hwrng
crw------- 1 root root 10, 183 Aug 1 01:16 /dev/hwrng

and things work.

Under VMWare fusion macOS X 10.13.x host Intel CPU
my Centos 7 guest also see

[ianm@ianm-centos7 ianm]# ls -al /dev/hwrng
crw------- 1 root root 10, 183 Aug 1 09:23 /dev/hwrng

I think installling rng-tools at this point is the way to move forward and evaluate other options as we discover them.

@tianon
Copy link
Contributor Author
tianon commented Aug 2, 2018

Well, my host doesn't have /dev/hwrng, but maybe gating this on both that file not existing and us being in a VM makes enough sense. Thanks for all the digging! 👍 ❤️

I think if we've got /dev/hwrng, we don't need to do anything with rngd though -- at least in QEMU (where I've got such a device), the kernel appears to use it appropriately already out-of-the-box.

I was thinking maybe we should use haveged instead because there's a Tiny Core tcz package of it already, but it appears to only be available for x86 (not 64), so we'll be building from source anyhow and rng-tools seems more well-known.

@kaosagnt
Copy link
kaosagnt commented Aug 7, 2018

Anybody wanting to play with what I'm working on it's here

https://github.com/kaosagnt/boot2docker/tree/kernel-4.14

Kernel 4.14.60, aufs4.14.56+, Virtualbox 5.2.16, VMWare tools stable-10.3.0, rng-tools v6.3.1, Parallels 13.3.2-43368, Xen v7.10.0, qemu tools, Docker v18.06.0-ce

@tianon
Copy link
Contributor Author
tianon commented Aug 7, 2018

See #1326 for a relevant discussion / decision that makes this slightly simpler for future releases (removing the AUFS part of the problem entirely).

@tianon
Copy link
Contributor Author
tianon commented Aug 7, 2018

(I'm still not 100% sold on the QEMU guest agent -- can we please keep that discussion / implementation separate? #1319)

@kaosagnt
Copy link

Any body interested in TCL 9.x or XFS filesystem support can find it apart of

https://github.com/kaosagnt/boot2docker/tree/kernel-4.14

@tianon
Copy link
Contributor Author
tianon commented Sep 10, 2018

I've done a little more testing on this today and it turns out gating on /dev/hwrng isn't sufficient -- when QEMU is not provided the virtio RNG device, it still has /dev/hwrng (and dockerd still hangs on startup), but again my host does not have /dev/hwrng and it works fine.

I think we'll also need to test /proc/sys/kernel/random/entropy_avail to see if it's a low value -- I can't think of any better solutions. It's either that, or we get to just embrace poor randomness sources in boot2docker, which doesn't sound great. 😅

I'm leaning more towards haveged right now over rng-tools simply because it's much easier to compile in a way that drops cleanly into the odd boot2docker environment (and has very minimal compilation dependencies). I want to finally put this to bed this week. 👍

@tianon
Copy link
Contributor Author
tianon commented Sep 10, 2018

Ok, this is being replaced by #1332. 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0