Removing ext2 and/or ext3
The question, as it turns out, came sooner - February 3, to be exact - when Jan Kara suggested that removing ext2 and ext3 could be discussed at the upcoming storage, filesystems, and memory management summit. Jan asked:
One might protest that there will be existing filesystems in the ext3 (and even ext2) formats for the indefinite future. Removing support for those formats is clearly not something that can be done. But removing the ext2 and/or ext3 code is not the same as removing support: ext4 has been very carefully written to be able to work with the older formats without breaking compatibility. One can mount an ext3 filesystem using the ext4 code and make changes; it will still be possible to mount that filesystem with the ext3 code in the future.
So it is possible to remove ext2 and ext3 without breaking existing users or preventing them from going back to older implementations. Beyond that, mounting an ext2/3 filesystem under ext4 allows the system to use a number of performance enhancing techniques - like delayed allocation - which do not exist in the older implementations. In other words, ext4 can replace ext2 and ext3, maintain compatibility, and make things faster at the same time. Given that, one might wonder why removing the older code even requires discussion.
There appear to be a couple of reasons not to hurry into this change, both of which have to do with testing. As Eric Sandeen noted, some of the more ext3-like options are not tested as heavily as the native modes of operation:
There is also concern that ext4, which is still seeing much more change than its predecessors, is more likely to introduce instabilities. That's a bit of a disturbing idea; there are enough production users of ext4 now that the introduction of serious bugs would not be pleasant. But, again, the backward-compatible operating modes of ext4 may not be as heavily tested as the native mode, so one might argue that operation with older filesystems is more likely to break regardless of how careful the developers are.
So, clearly, any move to get rid of ext2 and ext3 would have to be preceded by the introduction of better testing for the less-exercised corners of ext4. The developers involved understand that clearly, so there is no need to be worried that the older code could be removed too quickly.
Meanwhile, there are also concerns that the older code, which is not seeing much developer attention, could give birth to bugs of its own. As Jan put it:
Developers have also expressed concern that new filesystem authors might copy code from ext2, which, at this point, does not serve as a good example for how Linux filesystems should be written.
The end result is that, once the testing concerns have been addressed,
everybody involved might be made better off by the removal of ext2 and
ext3. Users with older filesystems would get better performance and a code
base which is seeing more active development and maintenance. Developers
would be able to shed an older maintenance burden and focus their efforts
on a single filesystem going forward. Thanks to the careful compatibility
work which has been done over the years, it may be possible to safely make
this move in the relatively near future.
Index entries for this article | |
---|---|
Kernel | Filesystems/ext4 |
Posted Feb 10, 2011 1:41 UTC (Thu)
by hisdad (subscriber, #5375)
[Link] (2 responses)
Posted Feb 10, 2011 2:43 UTC (Thu)
by smithj (guest, #38034)
[Link]
I imagine someone will eventually add support to grub for the new ext4-specific filesystem options.
Posted Feb 10, 2011 5:44 UTC (Thu)
by jthill (subscriber, #56558)
[Link]
Posted Feb 10, 2011 2:41 UTC (Thu)
by smithj (guest, #38034)
[Link] (1 responses)
Maybe this is a weird use case, but my university has many systems running RHEL 4/5 which get upgraded (and sometimes downgraded again) without changing the filesystems.
Posted Feb 10, 2011 15:22 UTC (Thu)
by ricwheeler (subscriber, #4980)
[Link]
Upgrading and downgrading is most likely to stay with ext3 so going up and down versions is not an issue.
Posted Feb 10, 2011 15:46 UTC (Thu)
by rfunk (subscriber, #4054)
[Link] (26 responses)
Isn't delayed allocation still a rather controversial aspect of ext4? Or has it been decided that all the applications will be rewritten to fsync all the time?
I realize that my information is likely out of date, but I'm sure I'm not the only one sticking with ext3 due to this issue.
Posted Feb 10, 2011 23:00 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link] (20 responses)
Posted Feb 10, 2011 23:19 UTC (Thu)
by rfunk (subscriber, #4054)
[Link] (19 responses)
As for defaults.... I just did a Kubuntu 10.10 install not too long ago, and was asked to choose from a number of different filesystems (including all the ext[234] variants). I don't remember a specific default, though it's possible that I'm not remembering ext4 being pre-checked.
Posted Feb 11, 2011 13:17 UTC (Fri)
by rahulsundaram (subscriber, #21946)
[Link] (18 responses)
Posted Feb 11, 2011 13:54 UTC (Fri)
by rfunk (subscriber, #4054)
[Link] (16 responses)
I already linked to the LWN article discussing the 2009 controversy. You can follow that link and the links within that article, or Google "ext4 delayed allocation". (The third link there is "Linus Torvalds upset over ext3 and ext4"! The Wikipedia article currently has a whole section about "Delayed allocation and potential data loss".)
I'm aware that XFS implemented delayed allocation before ext4. I'm also aware that XFS became notorious for badly messing up files and filesystems when there are power failures or similar; I'm especially aware of that since it happened to me multiple times, but any discussion of Linux filesystem reliability inevitably includes mention of XFS's problems. I investigated and learned that XFS had been explicitly designed for server room situations where the power *never* fails. Before that I was a big fan of XFS (having actually used it on SGIs in the 90s), and after that I went back to ext3 and have not had any problems with it.
I'd really like to hear from someone who acknowledges the 2009 controversy (and that those wary of delayed allocation at the time had a point), and who can explain how it's improved since then to the point that it's considered safe for ext3 users.
Posted Feb 11, 2011 18:47 UTC (Fri)
by zlynx (guest, #2285)
[Link] (7 responses)
Since almost all user-space file operations write files "atomically" by writing a new copy and renaming over the old copy, this works well.
This rename hack is also much faster than doing fsync on each file because the flush/rename combination may be delayed indefinitely. As long as the rename is always done after the file contents are on disk, the filesystem view will be consistent on reboot.
Posted Feb 11, 2011 18:50 UTC (Fri)
by dlang (guest, #313)
[Link] (5 responses)
if you want your rename to be safe across a crash/power failure you need to do a fsync.
there have been some hacks added to some filesystems to try and detect this to make it safer, but safer != safe
yes, ext3 let you get away with things like this (at least in the most common case), but no other filesystem on any *nix OS does.
the Unix spec says that renames are atomic, but that's only talking about a running system, not across a crash
Posted Feb 11, 2011 21:09 UTC (Fri)
by zlynx (guest, #2285)
[Link] (4 responses)
Either the file named X will contain new contents or old contents. It will never be blank or half written.
Again this only applies to ext3 and ext4, and only in ext4 after kernel 2.6.30.
Posted Feb 11, 2011 21:20 UTC (Fri)
by jrn (subscriber, #64214)
[Link] (3 responses)
That's simply not true, sadly. If I understand correctly, the patch[1] makes the race window shorter but does not eliminate it[2].
[1] v2.6.30-rc1~416^2~15 (ext4: Automatically allocate delay allocated blocks on rename, 2009-02-23)
Posted Feb 11, 2011 21:28 UTC (Fri)
by zlynx (guest, #2285)
[Link] (2 responses)
Allocate on rename is different from write on rename. All the discussions I followed claimed it would write the data before writing the rename.
I wonder why they thought allocate would be sufficient? Seems like they didn't listen to the users after all.
Posted Feb 11, 2011 23:28 UTC (Fri)
by jrn (subscriber, #64214)
[Link] (1 responses)
I think they did. There is nodelalloc for those who expect frequent crashes or do not want delayed allocation for some other reason. There is that hack to make 0-length files rare. And updating files using the common rename idiom does not force a painfully slow journal commit like it did in ext3 with data=ordered.
Meanwhile there is more awareness among application developers about the need to use fsync or fdatasync for data updates that need to persist and not to use those functions for updates that are not so crucial. So apps are finally doing the right thing on ubifs and hfs+.
So at least this ext4 user wouldn't have it any other way.
Posted Feb 11, 2011 23:43 UTC (Fri)
by zlynx (guest, #2285)
[Link]
So you end up with:
The sequence of events above is hardly better than it was before the fix.
Just don't allow step 2 to happen before step 3 and everyone would have been happy.
Posted Feb 11, 2011 19:35 UTC (Fri)
by rfunk (subscriber, #4054)
[Link]
Posted Feb 14, 2011 11:44 UTC (Mon)
by rahulsundaram (subscriber, #21946)
[Link] (7 responses)
"I investigated and learned that XFS had been explicitly designed for server room situations where the power *never* fails."
I heard of this myth several times before but have never actually seen a citation. Since you have claimed that you did some research and investigation, pointers would be helpful.
Posted Feb 14, 2011 14:56 UTC (Mon)
by rfunk (subscriber, #4054)
[Link] (6 responses)
Meanwhile, I don't care how "robust and scalable" and tested XFS is or claims to be; my experience shows that it's not reliable enough for my purposes, and others have similar experiences. (Again, I was once a big fan of XFS; then I discovered some of its failure modes, and found them unacceptable.)
I'd love to give you the citations about XFS's history, but since the last time I looked into that aspect in depth was around five years ago (and the first time was more than eleven years ago), I no longer have them anywhere near handy.
Posted Feb 14, 2011 16:34 UTC (Mon)
by rahulsundaram (subscriber, #21946)
[Link] (5 responses)
None of the links are new to me but the way you phrase it suggests that you are equating the idea and its implementation. The idea is too old and widely implemented in other filesystems to be controversial and it is a pretty much a required feature to get better performance. Implementation in Ext4 had some rough edges initially and that isn't a current problem.
As far as the robustness of XFS is concerned, personal anecdotes are just not interesting at all since it is not independently verifiable. I can claim that I have used XFS is a number of places and found it very robust but it doesn't really prove anything. What is interesting is where and how it is getting used and so far the deployments don't suggest that it is not worth trusting. Unless you can find a reference to the story of how XFS was designed to be only used in environments where power never fails, I just don't buy it.
Posted Feb 14, 2011 16:42 UTC (Mon)
by dlang (guest, #313)
[Link] (4 responses)
when it was initially merged it had a _lot_ of SGI baggage (shim layers between the XFS code and the rest of the kernel). it has had a lot of cleanup and maintinance, including a lot of testing (and the development of a filesystem test suite that other filesystems are starting to adopt since they don't have anything as comprehensive)
so while I have been using XFS for about 7 years, I would not be surprised to hear that people had problems about 5 years ago. I would be surprised if those problems persisted to today.
personally, I don't trust Ext4 yet, it's just too new, and it's still finding corner cases that have problems. It also is not being tested against multi-disk arrays very much (the developers don't have that sort of hardware, so they test against what they have)
Posted Feb 14, 2011 17:02 UTC (Mon)
by rahulsundaram (subscriber, #21946)
[Link] (3 responses)
"It also is not being tested against multi-disk arrays very much (the developers don't have that sort of hardware, so they test against what they have)"
IIRC, this was tested by Red Hat before making it default in RHEL 6. That however is not the very latest Ext4 code.
Posted Feb 14, 2011 17:12 UTC (Mon)
by dlang (guest, #313)
[Link] (2 responses)
yes, redhat did testing, but I'll bet that their testing was of the 'does it blow up' type of thing rather than performance testing.
In any case, the fact that the developers are not testing against that type of disk subsystem means that they are not looking for, or achieving the best performance when used with those subsystems (this was also confirmed by the Ext4 devs on the kernel mailing list)
I'm not saying that the Ext4 devs are incompetent or not doing the best that they can with what they have, just that the fact that they are not working with such large systems means that they are not running into the same stresses in their testing and profiling that people will run into in the real world with large systems.
the current XFS devs may or may not have access to such large arrays nowdays, but historically SGI was dealing with such arrays and did spend a lot of time researching how to make the filesystem as fast as it could be on such arrays, and that knowledge is part of the design of XFS. the current maintainers could destroy this as they are updating it, but this is not very likely.
Posted Feb 14, 2011 17:51 UTC (Mon)
by rahulsundaram (subscriber, #21946)
[Link] (1 responses)
I wouldn't bet on that. Red Hat has a fairly large filesystem team and performance team and run performance tests routinely, for public benchmarks (useful to convince customers) and otherwise. All the major Ext4 and XFS developers work for large vendors (Google, IBM, Red Hat etc) and I would have expected them to have access to enterprise hardware. XFS is known to scale better on big hardware atleast historically because of its legacy but the gap has reduced considerably in recent kernel versions.
Posted Feb 14, 2011 17:53 UTC (Mon)
by dlang (guest, #313)
[Link]
so this is still pretty recent info.
Posted Feb 11, 2011 19:31 UTC (Fri)
by rfunk (subscriber, #4054)
[Link]
Posted Feb 11, 2011 18:19 UTC (Fri)
by jrn (subscriber, #64214)
[Link] (1 responses)
Wouldn't ext4 with nodelalloc be just as safe?
Posted Feb 11, 2011 18:32 UTC (Fri)
by rfunk (subscriber, #4054)
[Link]
Posted Feb 11, 2011 21:10 UTC (Fri)
by rfunk (subscriber, #4054)
[Link] (2 responses)
So if I understand things correctly, anyone using ext3's (upstream) defaults since 2.6.31 shouldn't see any less reliability by switching to ext4.....
On the other hand, distributions have not necessarily followed the upstream defaults. A quick quick of my Ubuntu 10.10 kernel config shows that Ubuntu chose to stick with data-ordered in ext3, rather than moving to the new upstream default of data=writeback. I'm sure they're not the only one.
Posted Feb 18, 2011 14:45 UTC (Fri)
by dpotapov (guest, #46495)
[Link]
However, most distributives have decided to stay with the ordered mode. Therefore, in 2.6.31, words against EXT3_DEFAULTS_TO_ORDERED=y were replaced by Ted Ts'o with some more neutral language describing trade-offs. In 2.6.36, the default mode was changed back to 'data=ordered' by Dave Chinner from RedHat: "because we should be caring far more about avoiding stale data exposure than performance."
AFAIK, there is no significant difference between ext3 and ext4 when it comes exposing stale data. So, it is not exactly clear to me why this reason applies in one case but not in the other.
As to Ubuntu, Karmic was released with data=writeback, and remained so until the end of April 2010, but then it was suddenly changed to data=ordered. Both changes happened without any warning to users.
Posted Feb 12, 2011 17:57 UTC (Sat)
by pr1268 (subscriber, #24648)
[Link] (1 responses)
This seems like a serious issue - I have a couple of external-hard-drive-in-enclosures backing up my music directory (lots of mp3|ogg|flac files). Since I only make backups once every several weeks or so (my computer's music directory doesn't change all that often), I chose the non-journaling Ext2 for each. So, if the proposed removal of Ext2/3 were to be realized, then my questions are: Thanks in advance for answers/commentary/discussion!
Posted Feb 12, 2011 20:18 UTC (Sat)
by ABCD (subscriber, #53650)
[Link]
In order, the answers are: yes, yes, and yes. I have, in the past, had a number of ext2 and ext3 filesystems that I have attempted to mount with a ext4-only kernel. So far as I can remember, in every case it worked just fine. I also have used
Posted Feb 12, 2011 19:46 UTC (Sat)
by sethml (guest, #8471)
[Link]
1. Release a kernel in which "mount -t ext3" and "mount -t ext2" actually use the ext4 driver to mount the filesystem, translating mount options so they behave as expected. Rename the old ext2/ext3 drivers to ext2old/ext3old. Make the ext4 driver, when mounting an ext2 or ext3 filesystem, print to the kernel log a message mentioning that in the case of problems the user should try "mount -t ext2old/ext3old".
2. On some later major kernel release, once people are confident that ext4's backwards compatibility is good, delete the ext2old/ext3old code.
Posted Feb 13, 2011 18:49 UTC (Sun)
by ribbo (subscriber, #2400)
[Link]
Removing ext2 and/or ext3
Removing ext2 and/or ext3
grub2 boots off ext4 for me, extents and everything. Just did a tune2fs -l to be sure.
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
"Beyond that, mounting an ext2/3 filesystem under ext4 allows the system to use a number of performance enhancing techniques - like delayed allocation - which do not exist in the older implementations. In other words, ext4 can replace ext2 and ext3, maintain compatibility, and make things faster at the same time."
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
[2] https://bugzilla.kernel.org/show_bug.cgi?id=15910
Removing ext2 and/or ext3
Delayed allocation safety
Delayed allocation safety
1. Space allocated for the new file.
2. Directory written to disk with new filename.
---- CRASH HAPPENS HERE
3. New file contents written to disk.
Did I miss something in the sequence?
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
Removing ext2 and/or ext3
More information from 2009:
Removing ext2 and/or ext3
Avoiding ext4 for safety
Avoiding ext4 for safety
Removing ext2 and/or ext3
http://lwn.net/Articles/328363/
default data ordering mode
Removing ext2 and/or ext3
Removing ext2 and/or ext3
mkfs.ext2
/mkfs.ext3
to create ext2/3 filesystems on that same ext4-only kernel, mounted them on that system, copied files onto the filesystem, then mounted them on older systems that did not have ext4 support at all. Basically, everything worked perfectly transparently, without my having to change much of anything (I even left the "ext2
" or "ext3
" bit in /etc/fstab
, and the kernel used the ext4 driver to mount the filesystems because I set CONFIG_EXT4_USE_FOR_EXT23=y
).Transition Plan
embedded devices with support for booting ext2