Solid-state storage devices and the block layer

Posted Oct 5, 2010 20:38 UTC (Tue) by jzbiciak (guest, #5246)
In reply to: Solid-state storage devices and the block layer by dlang
Parent article: Solid-state storage devices and the block layer

You can do random writes to random empty sectors. Again, that's nothing like how a hard disk works. I'm still strenuously disagreeing with your earlier statement that flash's properties make it more like a disk than like RAM. It's really an entirely different beast worthy of separate consideration, which is why I think wrapping it up in an SSD limits its potential.

With flash, you need entirely new strategies that apply neither to disks nor RAM to get the full benefit from the technology. Much of the effort spent on disks revolves (no pun intended) around eliminating seeks. No such effort is required with RAM or with flash. Flash does require you to think about how you pool your free sectors, though, and how you schedule writing versus erasing. I won't deny that. Rather, I say it only further invalidates your original conjecture that it makes flash more like disks. (I will agree it makes it less like RAM though.)

Because seeks are "free", I could totally see load balancing algorithms of the form "write this block to the youngest free sector on the first available flash device", so that a new write doesn't get held up by devices busy with block erases. That looks nothing like what you'd want to do with a disk. It takes advantage of the "free seek" property of the flash while helping to hide the block erase penalty it imposes. Neither property is a property of a disk drive. Of course, neither property is a property of RAM, either.

Am I splitting hairs over semantics here? Let me step back and summarize, and see if you agree: Raw flash's random access capability and relatively low access time can make it much more like RAM than disk, especially in terms of bandwidth and latency. Raw flash's limitations on writes, however, require the OS to have flash-specific write strategies. They prevent the OS from treating flash identically to RAM, and will require careful thought to be handled correctly. This is similar to how we had to put careful thought into disk scheduling algorithms, even if flash requires entirely different algorithms to address its unique properties.

Solid-state storage devices and the block layer

Posted Oct 9, 2010 14:10 UTC (Sat) by joern (guest, #22392) [Link] (2 responses)

> Flash does require you to think about how you pool your free sectors, though, and how you schedule writing versus erasing.

Intriguing. Can you elaborate a bit? What difference does it make vs. the naïve approach of erasing before writing?

Solid-state storage devices and the block layer

Posted Oct 9, 2010 14:55 UTC (Sat) by dlang (guest, #313) [Link]

the issue is that you have to erase large chunks (on the order of 128K bytes), if you are then writing in small chunks (say the 512 byte sectors that are the default, or even the 4K byte filesystem blocks) you can't just erase just before writing.

you also have the problem that erasing takes a significant amount of time and power to accomplish, so you don't want to wait until you need to erase to do so and you don't want to erase when you don't need to and are on battery

Solid-state storage devices and the block layer

Posted Oct 9, 2010 15:03 UTC (Sat) by jzbiciak (guest, #5246) [Link]

Note: I'm not an expert. Please do not mistake me for one. :-) Here are my observations, though, along with things I've read elsewhere.

Flash requires wear leveling in order to maximize its life. For the greatest effect, you want to wear level across the entire device, which means picking up and moving otherwise quiescent data so that each sector sees approximately the same number of erasures. That's one aspect.

Another aspect is that erase blocks are generally much larger than write sectors. So, when you do erase, you end up erasing quite a lot. Furthermore erasure is about an order of magnitude slower than writing, and writing is about an order of magnitude slower than reading. For a random flash device whose data sheet I just pulled up, a random read takes 25us, page program takes 300us, and block erase takes 2ms. Pages are 2K bytes, whereas erase blocks are 128K bytes.

(Warning: This is where I get speculative!) And finally, if you have multiple flash devices (or multiple independent zones on the same flash device), you can take advantage of that fact and the fact that "seeks are free" by redirecting writes to idle flash units if others are busy. That's probably the most interesting area to explore algorithmically, IMO. Given that an erase operation can take a device out of commission for 2ms, picking which device to start an erase operation on and when to do it can have a pretty big impact on performance. If you can do background erase on idle devices, for example, then you can hide the cost.