Message ID | 20190612163144.18486-2-roman.stratiienko@globallogic.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] nbd: make sock_xmit() and nbd_add_socket() more generic | expand |
On 6/12/19 11:31 AM, roman.stratiienko@globallogic.com wrote: > From: Roman Stratiienko <roman.stratiienko@globallogic.com> > > Adding support to nbd to use it as a root device. This code essentially > provides a minimal nbd-client implementation within the kernel. It opens > a socket and makes the negotiation with the server. Afterwards it passes > the socket to the normal nbd-code to handle the connection. > > The arguments for the server are passed via kernel command line. > The kernel command line has the format > 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. > SERVER_IP is optional. If it is not available it will use the > root_server_addr transmitted through DHCP. > > Based on those arguments, the connection to the server is established > and is connected to the nbd0 device. The rootdevice therefore is > root=/dev/nbd0. > > Patch was initialy posted by Markus Pargmann <mpa@pengutronix.de> > and can be found at https://lore.kernel.org/patchwork/patch/532556/ > > Change-Id: I78f7313918bf31b9dc01a74a42f0f068bede312c > Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> > Reviewed-by: Aleksandr Bulyshchenko <A.Bulyshchenko@globallogic.com> > +static int nbd_connection_negotiate(struct socket *sock, char *export_name, > + size_t *rsize, u16 *nflags) > +{ > + > + ret = sock_xmit_buf(sock, 0, &flags, sizeof(flags)); > + if (ret < 0) > + return ret; > + > + *nflags = ntohs(flags); > + > + client_flags = 0; Why not reply with NBD_FLAG_C_FIXED_NEWSTYLE? Which in turn would be necessary... > + > + ret = sock_xmit_buf(sock, 1, &client_flags, sizeof(client_flags)); > + if (ret < 0) > + return ret; > + > + magic = cpu_to_be64(nbd_opts_magic); > + ret = sock_xmit_buf(sock, 1, &magic, sizeof(magic)); > + if (ret < 0) > + return ret; > + > + opt = htonl(NBD_OPT_EXPORT_NAME); Why not use NBD_OPT_GO? That's a better interface (better error recovery, and some servers can even use it to advertise preferred block sizing - particularly if a server insists on a 4k minimum block instead of 512). ...using NBD_OPT_GO would require checking that the server advertised FIXED_NEWSTYLE as well as replying that we intend to rely on it. Otherwise the idea looks interesting to me.
On 6/12/19 11:31 AM, roman.stratiienko@globallogic.com wrote: > From: Roman Stratiienko <roman.stratiienko@globallogic.com> > > Adding support to nbd to use it as a root device. This code essentially > provides a minimal nbd-client implementation within the kernel. It opens > a socket and makes the negotiation with the server. Afterwards it passes > the socket to the normal nbd-code to handle the connection. > > The arguments for the server are passed via kernel command line. > The kernel command line has the format > 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. Did you intend for nbdroot=1234 to connect to port 1234 or to server 1234 port 10809? Is an export name mandatory even when it is the empty string, in which case, is the / character mandatory? Maybe this would be better written as: [<SERVER_IP>[:<SERVER_PORT]][/<EXPORT_NAME] although that would allow nbdroot= using all defaults (will that still do the right thing?). Should we support nbdroot=URI, and tie this in to Rich's proposal [1] on standardizing the set of URIs that refer to an NBD export? It seems like you are still limited to a TCP socket (not Unix) with no encryption, so this would be equivalent to the URI: nbd://[server[:port]][/export] [1] https://lists.debian.org/nbd/2019/06/msg00011.html
On 6/13/19 8:02 AM, Eric Blake wrote: > On 6/12/19 11:31 AM, roman.stratiienko@globallogic.com wrote: >> From: Roman Stratiienko <roman.stratiienko@globallogic.com> >> >> Adding support to nbd to use it as a root device. This code essentially >> provides a minimal nbd-client implementation within the kernel. It opens >> a socket and makes the negotiation with the server. Afterwards it passes >> the socket to the normal nbd-code to handle the connection. >> >> The arguments for the server are passed via kernel command line. >> The kernel command line has the format >> 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. > > Did you intend for nbdroot=1234 to connect to port 1234 or to server > 1234 port 10809? Is an export name mandatory even when it is the empty > string, in which case, is the / character mandatory? Maybe this would > be better written as: > > [<SERVER_IP>[:<SERVER_PORT]][/<EXPORT_NAME] Make that: [[<SERVER_IP>][:<SERVER_PORT>]][/[<EXPORT_NAME>]] as well as a blurb that IPv6 requires use of [] around SERVER_IP to avoid ambiguity between IP address and the port designator. That would allow variations such as: nbdroot=1.2.3.4 # port 10809 and export name '' defaulted nbdroot=1.2.3.4:10809/ # fully explicit, export name '' nbdroot=:10810/export # host defaulted by DHCP, rest is explicit nbdroot=/ # host and port are defaulted, export is '' nbdroot=[::1]:10810 # IPv6 localhost and port 10810 nbdroot=::1 # IPv6 localhost, default port 10809 [I'm aware that localhost is probably not going to work based on how early this is encountered during the boot; rather, consider ::1 as a placeholder for a more realistic IPv6 address] > > although that would allow nbdroot= using all defaults (will that still > do the right thing?). nbdroot= # host from DHCP, port 10809, export '' > > Should we support nbdroot=URI, and tie this in to Rich's proposal [1] on > standardizing the set of URIs that refer to an NBD export? It seems > like you are still limited to a TCP socket (not Unix) with no > encryption, so this would be equivalent to the URI: > > nbd://[server[:port]][/export] > > [1] https://lists.debian.org/nbd/2019/06/msg00011.html >
On Wed, Jun 12, 2019 at 07:31:44PM +0300, roman.stratiienko@globallogic.com wrote: > From: Roman Stratiienko <roman.stratiienko@globallogic.com> > > Adding support to nbd to use it as a root device. This code essentially > provides a minimal nbd-client implementation within the kernel. It opens > a socket and makes the negotiation with the server. Afterwards it passes > the socket to the normal nbd-code to handle the connection. > > The arguments for the server are passed via kernel command line. > The kernel command line has the format > 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. > SERVER_IP is optional. If it is not available it will use the > root_server_addr transmitted through DHCP. > > Based on those arguments, the connection to the server is established > and is connected to the nbd0 device. The rootdevice therefore is > root=/dev/nbd0. > > Patch was initialy posted by Markus Pargmann <mpa@pengutronix.de> > and can be found at https://lore.kernel.org/patchwork/patch/532556/ > > Change-Id: I78f7313918bf31b9dc01a74a42f0f068bede312c > Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> > Reviewed-by: Aleksandr Bulyshchenko <A.Bulyshchenko@globallogic.com> Just throw nbd-client in your initramfs. Every nbd server has it's own handshake protocol, embedding one particular servers handshake protocol into the kernel isn't the answer here. Thanks, Josef
On Thu, Jun 13, 2019 at 4:52 PM Josef Bacik <josef@toxicpanda.com> wrote: > > On Wed, Jun 12, 2019 at 07:31:44PM +0300, roman.stratiienko@globallogic.com wrote: > > From: Roman Stratiienko <roman.stratiienko@globallogic.com> > > > > Adding support to nbd to use it as a root device. This code essentially > > provides a minimal nbd-client implementation within the kernel. It opens > > a socket and makes the negotiation with the server. Afterwards it passes > > the socket to the normal nbd-code to handle the connection. > > > > The arguments for the server are passed via kernel command line. > > The kernel command line has the format > > 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. > > SERVER_IP is optional. If it is not available it will use the > > root_server_addr transmitted through DHCP. > > > > Based on those arguments, the connection to the server is established > > and is connected to the nbd0 device. The rootdevice therefore is > > root=/dev/nbd0. > > > > Patch was initialy posted by Markus Pargmann <mpa@pengutronix.de> > > and can be found at https://lore.kernel.org/patchwork/patch/532556/ > > > > Change-Id: I78f7313918bf31b9dc01a74a42f0f068bede312c > > Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> > > Reviewed-by: Aleksandr Bulyshchenko <A.Bulyshchenko@globallogic.com> > > Just throw nbd-client in your initramfs. Every nbd server has it's own > handshake protocol, embedding one particular servers handshake protocol into the > kernel isn't the answer here. Thanks, > > Josef Hello Josef, Let me share some of my thoughts that was the motivation for providing this solution:: We choose NBD as a tool to run CI tests on our platforms. We have a wide range of different BSP's with different kind of images where using NFSROOT is hard or even impossible. Most of these BSPs are not using initramfs and some of them are Android-based. Taking all this into account we have to put significant efforts to implement and test custom initramfs and it will not cover all our needs. Much easier way is to embed small client into the kernel and just enable configuration when needed. I believe such solution will be very useful for wide range of kernel users. Also, as far as I know mainline nbd-server daemon have only 2 handshake protocols. So called OLD-STYLE and NEW-STYLE. And OLD-STYLE is no longer supported. So it should not be a problem, or please fix me if I'm wrong. Regards, Roman
On Thu, Jun 13, 2019 at 05:45:13PM +0300, Roman Stratiienko wrote: > On Thu, Jun 13, 2019 at 4:52 PM Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Wed, Jun 12, 2019 at 07:31:44PM +0300, roman.stratiienko@globallogic.com wrote: > > > From: Roman Stratiienko <roman.stratiienko@globallogic.com> > > > > > > Adding support to nbd to use it as a root device. This code essentially > > > provides a minimal nbd-client implementation within the kernel. It opens > > > a socket and makes the negotiation with the server. Afterwards it passes > > > the socket to the normal nbd-code to handle the connection. > > > > > > The arguments for the server are passed via kernel command line. > > > The kernel command line has the format > > > 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. > > > SERVER_IP is optional. If it is not available it will use the > > > root_server_addr transmitted through DHCP. > > > > > > Based on those arguments, the connection to the server is established > > > and is connected to the nbd0 device. The rootdevice therefore is > > > root=/dev/nbd0. > > > > > > Patch was initialy posted by Markus Pargmann <mpa@pengutronix.de> > > > and can be found at https://lore.kernel.org/patchwork/patch/532556/ > > > > > > Change-Id: I78f7313918bf31b9dc01a74a42f0f068bede312c > > > Signed-off-by: Roman Stratiienko <roman.stratiienko@globallogic.com> > > > Reviewed-by: Aleksandr Bulyshchenko <A.Bulyshchenko@globallogic.com> > > > > Just throw nbd-client in your initramfs. Every nbd server has it's own > > handshake protocol, embedding one particular servers handshake protocol into the > > kernel isn't the answer here. Thanks, > > > > Josef > > Hello Josef, > > Let me share some of my thoughts that was the motivation for providing > this solution:: > > We choose NBD as a tool to run CI tests on our platforms. > We have a wide range of different BSP's with different kind of images > where using NFSROOT is hard or even impossible. > Most of these BSPs are not using initramfs and some of them are Android-based. > > Taking all this into account we have to put significant efforts to > implement and test custom initramfs and it will not cover all our > needs. > > Much easier way is to embed small client into the kernel and just > enable configuration when needed. > > I believe such solution will be very useful for wide range of kernel users. > > Also, as far as I know mainline nbd-server daemon have only 2 > handshake protocols. So called OLD-STYLE and NEW-STYLE. And OLD-STYLE > is no longer supported. So it should not be a problem, or please fix > me if I'm wrong. > I don't doubt you have a good reason to want it, I'm just not clear on why an initramfs isn't an option? You have this special kernel with your special option, and you manage to get these things to boot your special kernel right? So why is a initramfs with a tiny nbd-client binary in there not an option? Also I mean that there are a bunch of different nbd servers out there. We have our own here at Facebook, qemu has one, IIRC there's a ceph one. They all have their own connection protocols. The beauty of NBD is that it doesn't have to know about that part, it just does the block device part, and I'd really rather leave it that way. Thanks, Josef
On 6/13/19 9:45 AM, Roman Stratiienko wrote: >> >> Just throw nbd-client in your initramfs. Every nbd server has it's own >> handshake protocol, embedding one particular servers handshake protocol into the >> kernel isn't the answer here. Thanks, The handshake protocol is well-specified: https://github.com/NetworkBlockDevice/nbd/blob/cdb0bc57f3faefd7a5562d57ad57cd990781c415/doc/proto.md All servers implement various subsets of that document for the handshake. > Also, as far as I know mainline nbd-server daemon have only 2 > handshake protocols. So called OLD-STYLE and NEW-STYLE. And OLD-STYLE > is no longer supported. So it should not be a problem, or please fix > me if I'm wrong. You are correct that oldstyle is no longer recommended. However, the current NBD specification states that newstyle has two different flavors, NBD_OPT_EXPORT_NAME (which you used, but is also old) and NBD_OPT_GO (which is newer, but is more likely to encounter differences where not all servers support it). The NBD specification includes a compatibility baseline: https://github.com/NetworkBlockDevice/nbd/blob/cdb0bc57f3faefd7a5562d57ad57cd990781c415/doc/proto.md#compatibility-and-interoperability and right now, NBD_OPT_GO (and _not_ NBD_OPT_EXPORT_NAME) is the preferred way forward. As long as your handshake implementation complies with the baseline documented there, you'll have maximum portability to the largest number of servers that also support the baseline - but not all servers are up to that baseline yet. So, this becomes a question of how much are you reinventing baseline portability handshake concerns in the kernel, vs. in initramfs.
On Thu, Jun 13, 2019 at 6:14 PM Eric Blake <eblake@redhat.com> wrote: > > On 6/13/19 9:45 AM, Roman Stratiienko wrote: > > >> > >> Just throw nbd-client in your initramfs. Every nbd server has it's own > >> handshake protocol, embedding one particular servers handshake protocol into the > >> kernel isn't the answer here. Thanks, > > The handshake protocol is well-specified: > https://github.com/NetworkBlockDevice/nbd/blob/cdb0bc57f3faefd7a5562d57ad57cd990781c415/doc/proto.md > > All servers implement various subsets of that document for the handshake. > > > Also, as far as I know mainline nbd-server daemon have only 2 > > handshake protocols. So called OLD-STYLE and NEW-STYLE. And OLD-STYLE > > is no longer supported. So it should not be a problem, or please fix > > me if I'm wrong. > > You are correct that oldstyle is no longer recommended. However, the > current NBD specification states that newstyle has two different > flavors, NBD_OPT_EXPORT_NAME (which you used, but is also old) and > NBD_OPT_GO (which is newer, but is more likely to encounter differences > where not all servers support it). > > The NBD specification includes a compatibility baseline: > https://github.com/NetworkBlockDevice/nbd/blob/cdb0bc57f3faefd7a5562d57ad57cd990781c415/doc/proto.md#compatibility-and-interoperability > > and right now, NBD_OPT_GO (and _not_ NBD_OPT_EXPORT_NAME) is the > preferred way forward. As long as your handshake implementation > complies with the baseline documented there, you'll have maximum > portability to the largest number of servers that also support the > baseline - but not all servers are up to that baseline yet. > > So, this becomes a question of how much are you reinventing baseline > portability handshake concerns in the kernel, vs. in initramfs. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > Thank you for the review comments, I will address them in v2. -- Regards, Roman
> I don't doubt you have a good reason to want it, I'm just not clear on why an > initramfs isn't an option? You have this special kernel with your special > option, and you manage to get these things to boot your special kernel right? > So why is a initramfs with a tiny nbd-client binary in there not an option? > > Also I mean that there are a bunch of different nbd servers out there. We have > our own here at Facebook, qemu has one, IIRC there's a ceph one. They all have > their own connection protocols. The beauty of NBD is that it doesn't have to > know about that part, it just does the block device part, and I'd really rather > leave it that way. Thanks, > > Josef The only reason I prefer embed client into the kernel is to save valuable engineering time creating and supporting custom initramfs, that is not so easy especially on Android-based systems. Taking into account that if using NBD and creating custom initramfs required only for automated testing, many companies would choose manual deployment instead, that is bad for the human progress. I believe that all users of non-standard NBD handshake protocols could continue to use custom nbd-clients. Either you accept this patch or not I would like to pass review from maintainers and other persons that was involved in NBD development, thus making a step closer to get this mainlined in some future. -- Regards, Roman
On Thu, Jun 13, 2019 at 07:21:43PM +0300, Roman Stratiienko wrote: > > I don't doubt you have a good reason to want it, I'm just not clear on why an > > initramfs isn't an option? You have this special kernel with your special > > option, and you manage to get these things to boot your special kernel right? > > So why is a initramfs with a tiny nbd-client binary in there not an option? > > > > Also I mean that there are a bunch of different nbd servers out there. We have > > our own here at Facebook, qemu has one, IIRC there's a ceph one. They all have > > their own connection protocols. The beauty of NBD is that it doesn't have to > > know about that part, it just does the block device part, and I'd really rather > > leave it that way. Thanks, > > > > Josef > > > The only reason I prefer embed client into the kernel is to save > valuable engineering time creating and supporting custom initramfs, > that is not so easy especially on Android-based systems. > I'm unconvinced that creating an initramfs is any harder than building your own kernel with this patch and providing the configuration information to the device, especially for android where we inside of Facebook provision android devices with a custom initramfs all the time, and have done so for many, many years. > Taking into account that if using NBD and creating custom initramfs > required only for automated testing, many companies would choose > manual deployment instead, that is bad for the human progress. > > I believe that all users of non-standard NBD handshake protocols could > continue to use custom nbd-clients. There is no "standard NBD handshake protocol" is my point. That part exists outside of the NBD spec that the kernel is concerned with. The client is happy to do whatever it pleases. Now there's no doubt that the standard nbd client and server that is shipped is the reference spec, but my point is that it is still outside the kernel and so able to change as it sees fit. > > Either you accept this patch or not I would like to pass review from > maintainers and other persons that was involved in NBD development, > thus making a step closer to get this mainlined in some future. ...I am the maintainer, but feel free to try to get Jens to take a patch that I think is unnecessary. Thanks, Josef
On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote: > Also I mean that there are a bunch of different nbd servers out there. We have > our own here at Facebook, qemu has one, IIRC there's a ceph one. I can't claim to know about the Facebook one of course, but the qemu one uses the same handshake protocol as anyone else. The ceph ones that I've seen do too (but there are various implementations of that, so...). > They all have their own connection protocols. The beauty of NBD is > that it doesn't have to know about that part, it just does the block > device part, and I'd really rather leave it that way. Thanks, Sure. OTOH, there is definitely also a benefit to using the same handshake protocol everywhere, for interoperability reasons.
On Fri, Jun 14, 2019 at 12:33:43PM +0200, Wouter Verhelst wrote: > On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote: > > Also I mean that there are a bunch of different nbd servers out there. We have > > our own here at Facebook, qemu has one, IIRC there's a ceph one. > > I can't claim to know about the Facebook one of course, but the qemu one > uses the same handshake protocol as anyone else. The ceph ones that I've > seen do too (but there are various implementations of that, so...). > Ah, for some reason I remembered Qemu's being distinctly different. I suppose if most of the main ones people use are using the same handshake protocol that makes it more compelling. But there'd have to be a really good reason why a initramfs isn't viable, and so far I haven't heard a solid reason that's not an option other than "it's hard and we don't want to do it." > > They all have their own connection protocols. The beauty of NBD is > > that it doesn't have to know about that part, it just does the block > > device part, and I'd really rather leave it that way. Thanks, > > Sure. > > OTOH, there is definitely also a benefit to using the same handshake > protocol everywhere, for interoperability reasons. > Sure, Facebook's isn't different because we hate the protocol, we just use Thrift for all of our services, and thus it makes sense for us to use thrift for the client connection stuff to make it easy on all the apps that use disagg. Thanks, Josef
On Fri, Jun 14, 2019 at 09:38:04AM -0400, Josef Bacik wrote: > On Fri, Jun 14, 2019 at 12:33:43PM +0200, Wouter Verhelst wrote: > > On Thu, Jun 13, 2019 at 10:55:36AM -0400, Josef Bacik wrote: > > > Also I mean that there are a bunch of different nbd servers out there. We have > > > our own here at Facebook, qemu has one, IIRC there's a ceph one. > > > > I can't claim to know about the Facebook one of course, but the qemu one > > uses the same handshake protocol as anyone else. The ceph ones that I've > > seen do too (but there are various implementations of that, so...). > > > > Ah, for some reason I remembered Qemu's being distinctly different. > > I suppose if most of the main ones people use are using the same handshake > protocol that makes it more compelling. But there'd have to be a really good > reason why a initramfs isn't viable, and so far I haven't heard a solid reason > that's not an option other than "it's hard and we don't want to do it." Oh, I agree with that. It's not hard at all; I'm aware of two implementations of doing that (I've written an nbd initramfs hook for Debian's initramfs infrastructure, and others have done it for dracut). I'm assuming that buildroot will have an initramfs framework too (and if it doesn't, then that probably just means someone should write it), and then writing an nbd initramfs hook for that framework should be reasonably easy. Also, the proposed initramfs configuration protocol uses the same "nbdroot" name as my Debian initramfs hooks, but in an incompatible way. I'd really like to see that be done differently, before/if it's accepted at all ;-)
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index 20bb4bfa4be6..e17f2376de60 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -273,6 +273,25 @@ config BLK_DEV_NBD If unsure, say N. +config BLK_DEV_NBDROOT + bool "Early network block device client support" + depends on BLK_DEV_NBD=y + ---help--- + Saying yes will enable kernel NBD client support. This allows to + connect entire disk with multiple partitions before mounting rootfs. + + The arguments for the server are passed via kernel command line. + The kernel command line has the format + 'nbdroot=[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>'. + SERVER_IP is optional. If it is not available it will use the + root_server_addr transmitted through DHCP. + + Based on those arguments, the connection to the server is established + and is connected to the nbd0 device. The rootdevice therefore is + root=/dev/nbd0. + + If unsure, say N. + config BLK_DEV_SKD tristate "STEC S1120 Block Driver" depends on PCI diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 63fcfb38e640..cb5e60419e07 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -46,6 +46,35 @@ #define CREATE_TRACE_POINTS #include <trace/events/nbd.h> +#include <net/ipconfig.h> +#include <linux/in.h> +#include <linux/tcp.h> +#include <linux/nfs_fs.h> +#include <linux/nfs.h> + +#define ADDR_NONE cpu_to_be32(INADDR_NONE) + +static const char nbd_magic[] = "NBDMAGIC"; +static const u64 nbd_opts_magic = 0x49484156454F5054LL; + +/* Options used for the kernel driver */ +#define NBD_OPT_EXPORT_NAME 1 + +#define NBD_DEFAULT_BLOCKSIZE 512 /* bytes */ + +#define NBD_DEFAULT_TIMEOUT 2 /* seconds */ + +#define NBD_MAXPATHLEN NFS_MAXPATHLEN + +struct nbdroot { + const char *bdev; + __be32 server_addr; + __be32 server_port; + loff_t block_size; + int timeout; + char server_export[NBD_MAXPATHLEN + 1]; +}; + static DEFINE_IDR(nbd_index_idr); static DEFINE_MUTEX(nbd_index_mutex); static int nbd_total_devices = 0; @@ -441,6 +470,16 @@ static int sock_xmit(struct socket *sock, int send, return result; } +static int sock_xmit_buf(struct socket *sock, int send, + void *buf, size_t size) +{ + struct iov_iter iter; + struct kvec iov = {.iov_base = buf, .iov_len = size}; + + iov_iter_kvec(&iter, WRITE | ITER_KVEC, &iov, 1, size); + return sock_xmit(sock, send, &iter, 0, 0); +} + static int nbd_xmit(struct nbd_device *nbd, int index, int send, struct iov_iter *iter, int msg_flags, int *sent) { @@ -2301,6 +2340,261 @@ static void __exit nbd_cleanup(void) unregister_blkdev(NBD_MAJOR, "nbd"); } +#ifdef CONFIG_BLK_DEV_NBDROOT + +struct nbdroot nbdroot_0 = {.bdev = "nbd0", + .server_export = "", + .server_addr = ADDR_NONE, + .timeout = NBD_DEFAULT_TIMEOUT, + .block_size = NBD_DEFAULT_BLOCKSIZE}; + +static int nbd_connect(struct nbdroot *nbdroot, struct socket **socket) +{ + struct socket *sock; + struct sockaddr_in sockaddr; + int err; + char val; + + err = sock_create_kern(&init_net, AF_INET, SOCK_STREAM, + IPPROTO_TCP, &sock); + if (err < 0) + return err; + + sockaddr.sin_family = AF_INET; + sockaddr.sin_addr.s_addr = nbdroot->server_addr; + sockaddr.sin_port = nbdroot->server_port; + + val = 1; + sock->ops->setsockopt(sock, IPPROTO_TCP, TCP_NODELAY, &val, + sizeof(val)); + + err = sock->ops->connect(sock, (struct sockaddr *)&sockaddr, + sizeof(sockaddr), 0); + if (err < 0) + return err; + + *socket = sock; + + return 0; +} + +static int nbd_connection_negotiate(struct socket *sock, char *export_name, + size_t *rsize, u16 *nflags) +{ + char buf[256]; + int ret; + u64 magic; + u16 flags; + u32 client_flags; + u32 opt; + u32 name_len; + u64 nbd_size; + + ret = sock_xmit_buf(sock, 0, buf, 8); + if (ret < 0) + return ret; + + if (strncmp(buf, nbd_magic, 8)) + return -EINVAL; + + ret = sock_xmit_buf(sock, 0, &magic, sizeof(magic)); + if (ret < 0) + return ret; + magic = be64_to_cpu(magic); + + if (magic != nbd_opts_magic) + return -EINVAL; + + ret = sock_xmit_buf(sock, 0, &flags, sizeof(flags)); + if (ret < 0) + return ret; + + *nflags = ntohs(flags); + + client_flags = 0; + + ret = sock_xmit_buf(sock, 1, &client_flags, sizeof(client_flags)); + if (ret < 0) + return ret; + + magic = cpu_to_be64(nbd_opts_magic); + ret = sock_xmit_buf(sock, 1, &magic, sizeof(magic)); + if (ret < 0) + return ret; + + opt = htonl(NBD_OPT_EXPORT_NAME); + ret = sock_xmit_buf(sock, 1, &opt, sizeof(opt)); + if (ret < 0) + return ret; + + name_len = strlen(export_name) + 1; + name_len = htonl(name_len); + ret = sock_xmit_buf(sock, 1, &name_len, sizeof(name_len)); + if (ret < 0) + return ret; + + ret = sock_xmit_buf(sock, 1, export_name, strlen(export_name) + 1); + if (ret < 0) + return ret; + + ret = sock_xmit_buf(sock, 0, &nbd_size, sizeof(nbd_size)); + if (ret < 0) + return ret; + nbd_size = be64_to_cpu(nbd_size); + + ret = sock_xmit_buf(sock, 0, &flags, sizeof(flags)); + if (ret < 0) + return ret; + *nflags = ntohs(flags); + + ret = sock_xmit_buf(sock, 0, buf, 124); + if (ret < 0) + return ret; + + *rsize = nbd_size; + + return 0; +} + +static int nbd_bind_connection(struct nbdroot *nbdroot, struct nbd_device *nbd, + struct socket *sock, size_t rsize, u32 flags) +{ + int conn, ret; + struct block_device *bdev = blkdev_get_by_dev(disk_devt(nbd->disk), + FMODE_READ | FMODE_WRITE, 0); + + if (IS_ERR(bdev)) { + pr_err("nbdroot: blkdev_get_by_dev failed %ld\n", + PTR_ERR(bdev)); + return PTR_ERR(bdev); + } + + conn = nbd->config->num_connections; + ret = nbd_add_socket(nbd, sock, false); + if (ret) { + pr_err("nbdroot: add socket failed %d\n", ret); + return ret; + } + + mutex_lock(&nbd->config->socks[conn]->tx_lock); + + nbd->config->flags = flags; + + nbd_size_set(nbd, nbdroot->block_size, + div_s64(rsize, nbdroot->block_size)); + + nbd->tag_set.timeout = nbdroot->timeout * HZ; + blk_queue_rq_timeout(nbd->disk->queue, nbdroot->timeout * HZ); + + mutex_unlock(&nbd->config->socks[conn]->tx_lock); + + ret = nbd_start_device_ioctl(nbd, bdev); + if (ret) { + pr_err("nbdroot: start device ioctl failed %d\n", ret); + return ret; + } + + return 0; +} + +static int nbdroot_thread(void *arg) +{ + struct nbdroot *nbdroot = (struct nbdroot *)arg; + struct socket *sock = 0; + size_t rsize; + u16 nflags; + int ret; + dev_t devt = blk_lookup_devt(nbdroot->bdev, 0); + struct gendisk *disk = get_gendisk(devt, &ret); + struct nbd_device *nbd = (struct nbd_device *)disk->private_data; + + ret = nbd_connect(nbdroot, &sock); + if (ret) { + pr_err("nbdroot: connect failed %d\n", ret); + goto err; + } + + ret = nbd_connection_negotiate(sock, nbdroot->server_export, + &rsize, &nflags); + if (ret) { + pr_err("nbdroot: negotiation failed %d\n", ret); + goto err; + } + + ret = nbd_bind_connection(nbdroot, nbd, sock, rsize, nflags); + if (ret) { + pr_err("nbdroot: nbd_bind_connection failed %d\n", ret); + goto err; + } + return 0; + +err: + pr_err("nbdroot: %s init failed, IP: %pI4, port: %i, export: %s\n", + nbdroot->bdev, &nbdroot->server_addr, + ntohs(nbdroot->server_port), nbdroot->server_export); + + if (sock) + sock_release(sock); + + return ret; +} + +static int __init nbdroot_init(void) +{ + if (nbdroot_0.server_port != 0) + kthread_run(nbdroot_thread, &nbdroot_0, "nbdroot_0"); + + return 0; +} + +/* We need this in late_initcall_sync to be sure that the network is setup */ +late_initcall_sync(nbdroot_init); + +/* + * Parse format "[<SERVER_IP>:]<SERVER_PORT>/<EXPORT_NAME>" + */ +static int __init nbdroot_setup(char *line) +{ + struct nbdroot *nbdroot = &nbdroot_0; + char *export; + u16 port; + int ret; + char buf[NBD_MAXPATHLEN + 1]; + + strlcpy(buf, line, sizeof(buf) - 1); + + nbdroot->server_addr = root_nfs_parse_addr(buf); + + if (*buf == '\0') + return -EINVAL; + + if (nbdroot->server_addr == ADDR_NONE) { + if (root_server_addr == ADDR_NONE) { + pr_err("nbdroot: Failed to find server address\n"); + return -EINVAL; + } + nbdroot->server_addr = root_server_addr; + } + + export = strchr(buf, '/'); + *export = '\0'; + ++export; + + ret = kstrtou16(buf, 10, &port); + if (ret) + return ret; + + nbdroot->server_port = htons(port); + strlcpy(nbdroot->server_export, export, + sizeof(nbdroot->server_export) - 1); + + return 0; +} + +__setup("nbdroot=", nbdroot_setup); + +#endif /* CONFIG_BLK_DEV_NBDROOT */ + module_init(nbd_init); module_exit(nbd_cleanup);