LSFMM: Copy offload
Copy offload is a feature that allows filesystems or storage devices to be instructed to copy files without requiring involvement of the local CPU. In an unfortunately post-lunch session at the 2013 LSFMM Summit, Zach Brown, Martin Petersen, and Roland Dreier led a discussion of the feature. The unfortunate part was that I nearly succumbed to a food coma, so my notes are rather weak—apologies to readers and participants.
There are three kinds of users for copy offload, Brown said, local filesystems like Btrfs, the NFS filesystem (so copies can be done on the server without involving the network), and SCSI-attached storage arrays, which could do a copy on the array itself. Trond Myklebust mentioned that he had an intern implement the functionality for NFS, which resulted in some "nice performance improvements" because the data was not copied down to the client. A big win for this feature is "copying giant files" like virtual machine images, as Ric Wheeler pointed out.
Brown said that they want the "cleanest possible interface" for copy offload. It would be relatively straightforward to add the feature into the block layer stack, but "it wants to be asynchronous". That means adding a new system call that would return a cookie, which applications could use to poll with or block on awaiting completion.
Dreier said that there are two operating system vendors who already ship support for the feature. VMWare uses the "EXTENDED COPY" SCSI command, while Windows 2012 uses a different set of SCSI commands in its ODX (Offloaded Data Transfer) feature.
There are some atomicity questions that need to be answered as well, Brown said. For example, if a user creates a new file with the name of the destination of a ongoing copy offload, it is unclear what the right semantics should be. Joel Becker noted that getting an EEXIST a day after issuing a copy offload would be rather painful.
Brown concluded by noting that patches would be forthcoming and that further discussion could be done on the mailing lists.
Index entries for this article | |
---|---|
Kernel | Block layer |
Conference | Storage, Filesystem, and Memory-Management Summit/2013 |
Posted Apr 25, 2013 7:17 UTC (Thu)
by Homer512 (subscriber, #85295)
[Link]
Posted Apr 25, 2013 8:52 UTC (Thu)
by bourbaki (guest, #84259)
[Link] (1 responses)
Posted May 13, 2013 22:40 UTC (Mon)
by drdabbles (guest, #48755)
[Link]
NFS servers know of the filesystem they share, which means they can literally execute a file copy. Asking the NFS server to do this saves bandwidth as well as CPU, because the file does not need to be copied to the client machine.
Some SAN or block storage devices can actually peer into filesystems they host as well. Their approach to accomplish this varies quite a bit, but they can do some basic file maintenance (intelligent file defrag, file based dedupe, etc.). Presumably a copy operation would be far simpler.
Posted Apr 25, 2013 9:25 UTC (Thu)
by bergwolf (guest, #55931)
[Link]
Although being an asynchronous interface, it would be a surprise if the copy offload API returns before part of destination file metadata is created, preventing following creat() of same file name from success.
Of course I assumed that both source and destination are file systems, given the provided returned value is EEXIST (can block layer ever return EEXIST? I don't think so...).
Posted Oct 2, 2013 14:14 UTC (Wed)
by lack (guest, #93109)
[Link]
LSFMM: Copy offload
When I first read the article, I was a bit surprised that the storage arrays were able to "copy files" (as is said in the first line), because I could not see how they could be aware of filesystem structures.LSFMM: Copy offload
However, it turns out copy offload is only about copying "ranges of blocks", as explained in your article about the 2012 LSFMM summit, which makes much more sense (to me, at least).
(And thank you for the coverage of this summit, which is a pleasure to read !)
LSFMM: Copy offload
LSFMM: Copy offload
LSFMM: Copy offload