LWN: Comments on "NFS: the new millennium" https://lwn.net/Articles/898262/ This is a special feed containing comments posted to the individual LWN article titled "NFS: the new millennium". en-us Thu, 09 Jan 2025 21:22:19 +0000 Thu, 09 Jan 2025 21:22:19 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net NFS: the new millennium https://lwn.net/Articles/904511/ https://lwn.net/Articles/904511/ mathstuf <div class="FormattedComment"> See the GP for the full thing; LWN will elide links and the link quoted is indeed the LWN truncation.<br> </div> Fri, 12 Aug 2022 02:44:29 +0000 NFS: the new millennium https://lwn.net/Articles/904307/ https://lwn.net/Articles/904307/ seamus <div class="FormattedComment"> Broken/incomplete(?) link<br> </div> Wed, 10 Aug 2022 00:05:44 +0000 NFS: the new millennium https://lwn.net/Articles/900617/ https://lwn.net/Articles/900617/ Wol <div class="FormattedComment"> I don&#x27;t think fsync was lying. It just assumed disks were telling the truth ...<br> <p> Cheers,<br> Wol<br> </div> Sun, 10 Jul 2022 12:04:51 +0000 NFS: the new millennium https://lwn.net/Articles/900610/ https://lwn.net/Articles/900610/ ssmith32 <div class="FormattedComment"> Wasn&#x27;t fsync caught &quot;lying&quot; a while ago? And disks as well, to inflate performance numbers? <br> <p> Or have I not kept up? I guess SSDs can probably hit their numbers &amp; still guarantee data is truly flushed from every volatile cache...<br> </div> Sun, 10 Jul 2022 08:58:56 +0000 NFS: the new millennium https://lwn.net/Articles/899421/ https://lwn.net/Articles/899421/ lathiat <div class="FormattedComment"> Yep that&#x27;s what I though too. It was quite a famous issue where it would re-assemble fragments incorrectly but then pass the checksum and corrupt the UDP data inflight. Partly to do with the NFS/RPC packet size versus MTU.<br> <p> All good.. just a misunderstanding :D I just happened to discuss the issue with a colleague the day the first post came out :)<br> </div> Thu, 30 Jun 2022 04:13:31 +0000 NFS: the new millennium https://lwn.net/Articles/899313/ https://lwn.net/Articles/899313/ neilbrown <div class="FormattedComment"> <font class="QuotedText">&gt; I assumed you were talking about UDP fragmentation!</font><br> <p> You aren&#x27;t the only one (I think). See <a href="https://lwn.net/Articles/898842/">https://lwn.net/Articles/898842/</a><br> <p> </div> Tue, 28 Jun 2022 23:12:26 +0000 NFS: the new millennium https://lwn.net/Articles/899217/ https://lwn.net/Articles/899217/ Sesse <div class="FormattedComment"> I assumed you were talking about UDP fragmentation!<br> </div> Tue, 28 Jun 2022 08:47:52 +0000 NFS: the new millennium https://lwn.net/Articles/899208/ https://lwn.net/Articles/899208/ neilbrown <div class="FormattedComment"> <font class="QuotedText">&gt; You hit diminishing returns pretty quickly above 1MB on any storage device I&#x27;ve ever worked on. </font><br> <p> I suspect you are correct.<br> How about a database update that needs to perform lots of random writes before a sync is needed? Any single NFS write must be contiguous, so there must be several. Without the v3 COMMIT, every one of those random writes would need to be committed before the write could return.<br> <p> Or what about writing lots of small files. The NFS client doesn&#x27;t need to COMMIT until the writeback timer fires, or memory reclaim wants the cache back, or a sync() is requested. Would it not be more efficient to commit when there are lots of dirty files, rather than once for each file?<br> <p> But the real point is that EVERY other layer in the storage stack has the two phases: write then sync/flush/commit. Why are you so sure that NFS doesn&#x27;t benefit from also have the same two phases?<br> </div> Mon, 27 Jun 2022 23:31:03 +0000 NFS: the new millennium https://lwn.net/Articles/899207/ https://lwn.net/Articles/899207/ neilbrown <div class="FormattedComment"> <font class="QuotedText">&gt; Neil, seems you forgot about the dangerous fragmentation alluded to in the last installment!</font><br> <p> I didn&#x27;t forget - but there wasn&#x27;t really anything to say. The fact that fragmentation (of the community, or of the protocol) might happen when separate people have separate needs should be obvious. A possible example of this is JMAP which risks fragmenting the IMAP community. Beyond the fact that I didn&#x27;t like JMAP when I reviewed it 6 years ago, I don&#x27;t know any details of this or whether it is a real fragmentation.<br> <p> But NFS didn&#x27;t fragment (to my knowledge). The community had regular Connectathon gatherings to ensure interoperability and to discuss issues. And importantly the IETF process was started that allowed anyone to contribute to NFSv4. So the IETF process, which I did mention, likely headed off any possible risk of protocol fragmentation.<br> <br> </div> Mon, 27 Jun 2022 23:24:45 +0000 NFS: the new millennium https://lwn.net/Articles/899193/ https://lwn.net/Articles/899193/ jra <div class="FormattedComment"> <font class="QuotedText">&gt; SMB will tell you it has received a lump of data whereas NFS will tell you when it has been committed to storage. Is this still true?</font><br> <p> No. SMB2 write has a flag that tells it not to return success until the write has hit stable storage.<br> <p> </div> Mon, 27 Jun 2022 19:03:05 +0000 NFS: the new millennium https://lwn.net/Articles/899111/ https://lwn.net/Articles/899111/ lathiat <div class="FormattedComment"> Neil, seems you forgot about the dangerous fragmentation alluded to in the last installment!<br> <p> Great review though, loved it, thanks! Long time NFS user from web hosting environments since circa 2006. <br> </div> Mon, 27 Jun 2022 03:55:54 +0000 NFS: the new millennium https://lwn.net/Articles/899104/ https://lwn.net/Articles/899104/ janfrode <div class="FormattedComment"> I&#x27;ve seen large throughput improvements by increasing rsize from 1 MB to 16 MB with Oracle dNFS client, towards NFS-Ganesha on top of GPFS (ESS). The ESS does 8+2p erasure coding, with 1 MB strip size, so 16 MB IOs is the optional size. I don&#x27;t think wsize was as important, since it could buffer up and do full block size writes on server side -- but this large rsize was needed to ensure full block (16MB) reads.<br> <p> I don&#x27;t think other NFS clients support larger than 1 MB rsize/wsize, which seems unfortunate for this kind of storage backend. <br> </div> Sun, 26 Jun 2022 21:05:59 +0000 NFS: the new millennium https://lwn.net/Articles/899079/ https://lwn.net/Articles/899079/ willy <div class="FormattedComment"> I guess that&#x27;s going to depend on the backend storage device. You hit diminishing returns pretty quickly above 1MB on any storage device I&#x27;ve ever worked on. Even a 5-disc RAID-5 with 256kB stripe width would handle 1MB writes with aplomb.<br> </div> Sun, 26 Jun 2022 12:15:48 +0000 NFS: the new millennium https://lwn.net/Articles/899066/ https://lwn.net/Articles/899066/ neilbrown <div class="FormattedComment"> <font class="QuotedText">&gt; I am very surprised that you skipped over .....</font><br> <p> I skipped over a lot of things. If it didn&#x27;t relate to state management, then I felt free to skip it unless it helped tell the story.<br> <p> <font class="QuotedText">&gt; (<a href="https://www.xkyle.com/solving-the-nfs-16-group-limit-prob">https://www.xkyle.com/solving-the-nfs-16-group-limit-prob</a>...)</font><br> <p> That link contains a section &quot;The Best Solution Ever!: A New Option for the NFS Server&quot;. I&#x27;m glad my contribution there was appreciated :-)<br> <p> <font class="QuotedText">&gt; deferring permissions to an external server with the sec mount option seems to be the only way.</font><br> <p> Yes. At least you do need to have the NFS server determine the list of gids. You don&#x27;t need to use the sec= mount option. Using the default sec=sys works fine with the Linux NFS server if you use --manage-gids. The NetApp server has similar functionality. Others might too.<br> <p> <font class="QuotedText">&gt; Is this a protocol or implementation limit?</font><br> <p> It is a limit in the RPC protocol<br> <p> <font class="QuotedText">&gt; I’d love more discussion on the evolution of various security functionality</font><br> <p> Not really my area of expertise, sorry.<br> <p> </div> Sun, 26 Jun 2022 01:51:57 +0000 NFS: the new millennium https://lwn.net/Articles/899056/ https://lwn.net/Articles/899056/ zerolagtime <div class="FormattedComment"> Thanks for a great look at the internals. <br> I am very surprised that you skipped over a still-standing limit on the number of gids that will be compared on a complex network. <br> 16 gids is the limit (<a href="https://www.xkyle.com/solving-the-nfs-16-group-limit-problem/">https://www.xkyle.com/solving-the-nfs-16-group-limit-prob...</a>) and deferring permissions to an external server with the sec mount option seems to be the only way. Is this a protocol or implementation limit?<br> I’d love more discussion on the evolution of various security functionality, like xattrs, encryption algorithm, and the impact felt by stateful firewalls trying to accommodate port ranges. <br> </div> Sat, 25 Jun 2022 23:40:57 +0000 NFS: the new millennium https://lwn.net/Articles/899054/ https://lwn.net/Articles/899054/ neilbrown <div class="FormattedComment"> NFS limits the size of a single write request in that the server specifies a maximum that the client must honour.<br> Typically 1MB on Linux. There are costs in making to bigger, so just setting to 1GB wouldn&#x27;t work. When NFSv3 was first described UDP was still common and 64KB is a hard limit there.<br> Is there no value in merging multiple 1MB writes on the server?<br> <p> <p> </div> Sat, 25 Jun 2022 23:10:04 +0000 NFS: the new millennium https://lwn.net/Articles/899053/ https://lwn.net/Articles/899053/ willy <div class="FormattedComment"> I suppose by &quot;a competent local cache&quot;, I include the ability for the client to merge writes, which Linux will do.<br> </div> Sat, 25 Jun 2022 22:57:47 +0000 NFS: the new millennium https://lwn.net/Articles/899051/ https://lwn.net/Articles/899051/ neilbrown <div class="FormattedComment"> <font class="QuotedText">&gt; the NFSv2 semantics were just fine for this usage.</font><br> <p> Not really. The NFSv2 semantics require the server to sync each request individually, though maybe it could merge concurrent requests by delaying the sync for the first.<br> The NFSv3 semantics make it easy for the server to gather lots of writes in its cache and sync them all together.<br> <p> </div> Sat, 25 Jun 2022 22:29:29 +0000 NFS: the new millennium https://lwn.net/Articles/899050/ https://lwn.net/Articles/899050/ willy <div class="FormattedComment"> On a system with a competent local cache, the kernel returns success to the application after the write is copied to the local cache. NFSv3 offers no improvement here because we weren&#x27;t even calling WRITE in this path. WRITE was called on writeback and on fsync, and the NFSv2 semantics were just fine for this usage.<br> </div> Sat, 25 Jun 2022 21:55:05 +0000 NFS: the new millennium https://lwn.net/Articles/899036/ https://lwn.net/Articles/899036/ pbonzini <div class="FormattedComment"> The write is only durable after fsync, write is not enough. So even on POSIX systems it&#x27;s useful to know the moment when the write has been received.<br> </div> Sat, 25 Jun 2022 16:14:39 +0000 NFS: the new millennium https://lwn.net/Articles/899026/ https://lwn.net/Articles/899026/ willy <div class="FormattedComment"> <font class="QuotedText">&gt; SMB will tell you it has received a lump of data whereas NFS will tell you when it has been committed to storage.</font><br> <p> NFS v2 only allowed for the WRITE command to be acknowledged when the data was on stable storage. NFS v3 turned it into a two-phase commit, allowing the server to tell the client both &quot;I have received it&quot; and &quot;It is now stable&quot;.<br> <p> Two phase commit isn&#x27;t particularly useful to clients with a competent local cache. The latency of a write is almost irrelevant; you need to know when the write is durable, not just when it&#x27;s visible to others. It was of great interest to non-Unix clients, though.<br> </div> Sat, 25 Jun 2022 11:42:02 +0000 NFS: the new millennium https://lwn.net/Articles/899025/ https://lwn.net/Articles/899025/ willy <div class="FormattedComment"> A protocol which came between NFSv3 and v4 that certainly informed the design of v4 was WebNFS. It also integrated several protocols into one and discarded the use of Portmap.<br> </div> Sat, 25 Jun 2022 11:36:56 +0000 NFS: the new millennium https://lwn.net/Articles/899021/ https://lwn.net/Articles/899021/ wtarreau <div class="FormattedComment"> Thanks for this detailed explanation and for the numerous links, Neil!<br> </div> Sat, 25 Jun 2022 10:12:11 +0000 NFS: the new millennium https://lwn.net/Articles/899004/ https://lwn.net/Articles/899004/ gerdesj <div class="FormattedComment"> Great write up about something that I generally take for granted but by &#x27;eck, NFS has shifted some bytes from A to B for me alone. It&#x27;s nice to get some real insights into the background of these things from someone who knows what they are on about.<br> <p> NFS for me has a killer feature when compared to SMB and I was only made aware of it by Veeam. This may or may not still be true: SMB will tell you it has received a lump of data whereas NFS will tell you when it has been committed to storage. Is this still true? When you are doing backups to NAS, which involves some pretty monstrous files, this is rather important. <br> <p> Your 5TB backup is pretty useless with a hole in the middle of it due to a transient error that allowed a block or two to wander off and have a smoke behind the bikesheds and then sidling off for a night out in town instead of resting on disc like good data. OK, a decent backup app will have lots of failsafes available such as reading backups back against the source but ideally data should flow from A to B safely out of the box.<br> <p> Given the sheer complexity of file access these days - pick a protocol, pick a medium, wedge in a VPN or two and shake it up and see what happens! It&#x27;s a wonder that files seem to turn up as requested, in one piece.<br> </div> Sat, 25 Jun 2022 00:39:26 +0000