The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

June 18, 2008

This article was contributed by Valerie Aurora

Moore's Law - we all know it (or at least think we do). To be annoyingly exact, Moore's Law is a prediction that the number of components per integrated circuit (for minimum cost per component) will double every 24 months (revised up from every 12 months in the original 1965 prediction). In slightly more useful form, Moore's Law is often used as a shorthand for the continuing exponential growth of computing technology in many areas - disk capacity, clock speed, random access memory. Every time we approach the limit of some key computer manufacturing technology, the same debate rages: Is this the end of Moore's Law? So far, the answer has always been no.

But Moore's Law is inherently a statement about human ingenuity, market forces, and physics. Whenever exponential growth falters in one area - clock speed, or a particular mask technique - engineers find some new area or new technique to improve at an exponential pace. No individual technique experiences exponential growth for long, instead migration to new techniques occurs fast enough that the overall growth rate continues to be exponential.

The discovery and improvement of manufacturing techniques is driven on one end by demand for computation and limited on the other end by physics. In between is a morass of politics, science, and plain old engineering. It's hard to understand the myriad forces driving demand and the many factors affect innovation including economies of scale, cultural attitudes towards new ideas, vast marketing campaigns, and the strange events that occur during the death throes of megacorporations. By comparison, understanding the limits of computation is easy, as long as you have a working knowledge of quantum physics, information theory, and the properties of black holes.

The "Ultimate Laptop"

In a paper published in Nature in 2000, Ultimate Physical Limits of Computation (free arXiv preprint [PDF] here), Dr. Seth Lloyd calculates (and explains) the limits of computing given our current knowledge of physics. Of course, we don't know everything about physics yet - far from it - but just as in other areas of engineering, we know enough to make some extremely interesting predictions about the future of computation. This paper wraps up existing work on the physical limits of computing and introduces several novel results, most notably the ultimate speed limit to computation. Most interesting in my mind is the calculation of a surprisingly specific upper bound on how many years a generalized Moore's Law can remain in effect (keep reading to find out exactly how long!).

Dr. Lloyd begins by assuming that we have no idea what future computer manufacturing technology will look like. Many discussions of the future of Moore's Law center around physical limits on particular manufacturing techniques, such as the limit on feature size in optical masks imposed by the wavelength of light. Instead, he ignores manufacturing entirely and uses several key physical constants: the speed of light c, Planck's reduced constant h (normally written as h-bar, a symbol not available in standard HTML, so you'll have to just imagine the bar), the gravitational constant g, and Boltzmann's constant kB. These constants and our current limited understanding of general relativity and quantum physics are enough to derive many important limits on computing. Thus, these results don't depend on particular manufacturing techniques.

The paper uses the device of the "Ultimate Laptop" to help make the calculations concrete. The ultimate laptop is one kilogram in mass and has a volume of one liter (coincidentally almost exactly the same specs as a 2008 Eee PC), and operates at the maximum physical limits of computing. Applying the limits to the ultimate laptop gives you a feel for the kind of computing power you can get in luggable format - disregarding battery life, of course.

Energy limits speed

So, what are the limits? The paper begins with deriving the ultimate limit on the number of computations per second. This depends on the total energy, E, of the system, which can be calculated using Einstein's famous equation relating mass and energy, E = mc². (Told you we'd need to know the speed of light.) Given the total energy of the system, we then need to know how quickly the system can change from one distinguishable state to another - i.e., flip bits. This turns out to be limited by the Heisenberg uncertainty principle. Lloyd has this to say about the Heisenberg uncertainty principle:

In particular, the correct interpretation of the time-energy Heisenberg uncertainty principle ΔEΔt ≥ h is not that it takes time Δt to measure energy to an accuracy ΔE (a fallacy that was put to rest by Aharonov and Bohm) but rather that that a quantum state with spread in energy ΔE takes time at least Δt = πh/2ΔE to evolve to an orthogonal (and hence distinguishable) state. More recently, Margolus and Levitin extended this result to show that a quantum system with average energy E takes time at least Δt = πh/2E to evolve to an orthogonal state.

In other words, the Heisenberg uncertainty principle implies that a system will take a minimum amount of time to change in some observable way, and that the time is related to the total energy of the system. The result is that a system of energy E can perform 2E/πh logical operations per second (a logical operation is, for example, performing the AND operation on two bits of input - think of it as single bit operations, roughly). Since the ultimate laptop has a mass of 1 kilo, it has energy E = mc² = 8.9874 x 10¹⁶ joules. The ultimate laptop can perform a maximum of 5.4258 x 10⁵⁰ operations per second.

How close are we to the 5 x 10⁵⁰ operations per second today? Each of these operations is basically a single-bit operation, so we have to convert current measurements of performance to their single-bit operations per second equivalents. The most commonly available measure of operations per seconds is FLOPS (floating point operations per second) as measured by LINPACK (see the Wikipedia page on FLOPS). Estimating the exact number of actual physical single-bit operations involved in a single 32-bit floating point operation would require proprietary knowledge of the FPU implementation. The number of FLOPS as reported by LINPACK varies wildly depending on compiler optimization level as well. For this article, we'll make a wild estimate of 1000 single-bit operations per second (SBOPS) per FLOPS, and ask anyone with a better estimate to please post it in a comment.

With our FLOPS to SBOPS conversion factor of 1000, the current LINPACK record holder, the Roadrunner supercomputer (near my home town, Albuquerque, New Mexico), reaches speeds of one petaflop, or 1000 x 10¹⁵ = 1 x 10¹⁸ SBOPS. But that's for an entire supercomputer - the ultimate laptop is only one kilo in mass and one liter in volume. Current laptop-friendly CPUs are around one gigaflop, or 10¹² SBOPS, leaving us about 39 orders of magnitude to go before hitting the theoretical physical limit of computational speed. Finally, existing quantum computers have already attained the ultimate limit on computational speed - on a very small number of bits and in a research setting, but attained it nonetheless.

Entropy limits memory

What we really want to know about the ultimate laptop is how many legally purchased DVDs we can store on it. The amount of data a system can store is a function of the number of distinguishable physical states it can take on - each distinct configuration of memory requires a distinct physical state. According to Lloyd, we have "known for more than a century that the number of accessible states of a physical system, W, is related to its thermodynamic entropy by the formula: S = kB ln W" (S is the thermodynamic entropy of the system). This means we can calculate the number of bits the ultimate laptop can store if we know what its total entropy is.

Calculating the exact entropy of a system turns out to be hard. From the paper:

To calculate exactly the maximum entropy for a kilogram of matter in a liter volume would require complete knowledge of the dynamics of elementary particles, quantum gravity, etc. We do not possess such knowledge. However, the maximum entropy can readily be estimated by a method reminiscent of that used to calculate thermodynamic quantities in the early universe. The idea is simple: model the volume occupied by the computer as a collection of modes of elementary particles with total average energy E.

The following discussion is pretty heavy going; for example, it includes a note that baryon number may not be conserved in the case of black hole computing, something I'll have to take Lloyd's word on. But the end result is that the ultimate laptop, operating at maximum entropy, could store at least 2.13 x 10³¹ bits. Of course, maximum entropy means that all of the laptop's matter is converted to energy - basically, the equivalent of a thermonuclear explosion. As Lloyd notes, "Clearly, packaging issues alone make it unlikely that this limit can be obtained." Perhaps a follow-on paper can discuss the Ultimate Laptop Bag...

How close are modern computers to this limit? A modern laptop in 2008 can store up to 250GB - about 2 x 10¹² bits. We're about 19 orders of magnitude away from maximum storage capacity, or about 64 more doublings in capacity. Disk capacity as measured in bits per square inch has doubled about 30 times between 1956 and 2005, and at this historical rate, 64 more doublings will only take about 50 - 100 years. This isn't the overall limit on Moore's law as applied to computing, but it suggests the possibility of an end to Moore's law as applied to storage within some of our lifetimes. I guess we file system developers should think about second careers...

Redundancy and error correction

Existing computers don't approach the physical limits of computing for many good reasons. As Lloyd wryly observes, "Most of the energy [of existing computers] is locked up in the mass of the particles of which the computer is constructed, leaving only an infinitesimal fraction for performing logic." Storage of a single bit in DRAM uses "billions and billions of degrees of freedom" - electrons, for example - instead of just one degree of freedom. Existing computers tend to conduct computation at temperatures at which matter remains in the form of atoms instead of plasma.

Another fascinating practical limit on computation is the error rate of operations, which is bounded by the rate at which the computer can shed heat to the environment. As it turns out, logical operations don't inherently require the dissipation of energy, as von Neumann originally theorized. Reversible operations (such as NOT) which do not destroy information do not inherently require the dissipation of energy, only irreversible operations (such as AND). This makes some sense intuitively; the only way to destroy (erase) a bit is to turn that information into heat, otherwise the bit has just been moved somewhere else and the information it represents is still there. Reversible computation has been implemented and shown to have extremely low power dissipation.

Of course, some energy will always be dissipated, whether or not the computation is reversible. However, the erasure of bits - in particular, errors - requires a minimum expenditure of energy. The rate at which the system can "reject errors to the environment" in the form of heat limits the rate of bit errors in the system; or, conversely, the rate of bit errors combined with the rate of heat transfer out of the system limits the rate of bit operations. Lloyd estimates the rate at which the system can reject error bits to the environment, relative to the surface area and assuming black-body radiation, as 7.195 x 10⁴² bits per meter² per second.

Computational limits of "smart dust"

Right around the same time that I read the "Ultimate Limits" paper, I also read A Deepness in the Sky by Vernor Vinge, one of many science fiction books featuring some form of "smart dust." Smart dust is the concept of tiny computing elements scattered around the environment which operate as a sort of low-powered distributed computer. The smart dust in Vinge's book had enough storage for an entire systems manual, which initially struck me as a ludicrously large amount of storage for something the size of a grain of dust. So I sat down and calculated the limits of storage and computation for a computer one μm³ in size, under the constraint that its matter remain in the form of atoms (rather than plasma).

Lloyd calculates that, under these conditions, the ultimate laptop (one kilogram in one liter) can store about 10²⁵ bits and conduct 10⁴⁰ single-bit operations per second. The ultimate laptop is one liter and there are 10¹⁵ μm³ in a liter. Dividing the total storage and operations per second by 10¹⁵ gives us 10¹⁰ bits and 10²⁵ operations per second - about 1 gigabyte in data storage and so many FLOPS that the prefixes are meaningless. Basically, the computing potential of a piece of dust far exceeds the biggest supercomputer on the planet - sci-fi authors, go wild! Of course, none of these calculations take into account power delivery or I/O bandwidth, which may well turn out to be far more important limits on computation.

Implications of the ultimate laptop

Calculating the limits of the ultimate laptop has been a lot of fun, but what does it mean for computer science today? We know enough now to derive a theoretical upper bound for how long a generalized Moore's Law can remain in effect. Current laptops store 10¹² bits and conduct 10¹² single-bit operations per second. The ultimate laptop can store 10³¹ bits and conduct 10⁵¹ single-bit operations per second, a gap of a factor of 10¹⁹ and 10³⁹ respectively. Lloyd estimates the rate of Moore's Law as 10⁸ factor of improvement in areal bit density over the past 50 years. Assuming that both storage density and computational speed will improve by a factor of 10⁸ per 50 years, the limit will be reached in about 125 years for storage and about 250 years for operations per second. One imagines the final 125 years being spent frantically developing better compression algorithms - or advanced theoretical physics research.

Once Moore's Law comes to a halt, the only way to increase computing power will be to increase the mass and volume of the computer, which will also encounter fundamental limits. An unpublished paper entitled Universal Limits on Computation estimates that the entire computing capacity of the universe would be exhausted after only 600 years under Moore's Law.

250 years is a fascinating in-between length of time. It's too far away to be relevant to anyone alive today, but it's close enough that we can't entirely ignore it. Typical planning horizons for long-term human endeavors (like managing ecosystems) tend to max out around 300 years, so perhaps it's not unthinkable to begin planning for the end of Moore's Law. Me, I'm going to start work on the LZVH compression algorithm, tomorrow.

One thing is clear: we live in the Golden Age of computing. Let's make the most of it.

Valerie Henson is a Linux consultant specializing in file systems and owns a one kilo, one liter laptop.

Index entries for this article
Kernel	Kernel Hacker's Bookshelf
GuestArticles	Aurora (Henson), Valerie

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 2:52 UTC (Thu) by dskoll (subscriber, #1630) [Link] (7 responses)

Ray Kurzweil expands on this topic in his interesting (if somewhat repetitive) book The Singularity is Near. Worth reading.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 10:12 UTC (Thu) by Hanno (guest, #41730) [Link] (5 responses)

The singularity is not near. To be blunt, the singularity phenomenon is the quasi-religious expectation that a brighter tomorrow through technology is "certain" and universal deliverance of mankind is just around the corner.

The pretty good (though equally repetitive) book "Future-Hype" by Bob Seidensticker makes a good counter-argument:

Humans always made the mistake to extrapolate the future based on present development. When a rush of development in one technology leads to a jump in innovation, people would expect "exponential growth" in that area.

But that growth is not "exponential", it is only a significant jump compared to history, and it ends. Only because a jump in a different technology niche follows and futurists stop thinking about the previous jump anymore - that new tech is by now commenplace - doesn't mean that technology as such is developing at exponential rate and leading to a singularity.

Look at previous jumps in innovation: Steam, atomic power, plastics - each time when we were in their golden ages, people expected a superbright future based on these innovations.

> We live in the Golden Age of computing.

Indeed. And in history, each of those golden ages ended.

Kurzweil

Posted Jun 19, 2008 11:03 UTC (Thu) by corbet (editor, #1) [Link] (2 responses)

You missed the book suggestion - it is a worthwhile read - at least much of the first half of it is. One of the things he asserts is that the exponential growth we are seeing now is not "present development"; it is, instead, a trend which has been going on for a good two billion years. Contemporary electronics is just the current substrate upon which the growth of complexity is being carried out.

The notion of the singularity, incidentally, is not necessarily as utopian as you suggest here.

I'm not saying that arguments like Kurzweil's are necessarily correct - no doubt there's plenty of ways to poke holes in them. But it's best to understand them first.

Kurzweil

Posted Jun 19, 2008 11:17 UTC (Thu) by Hanno (guest, #41730) [Link] (1 responses)

You missed the book suggestion

I read it and found it interesting and tiring.

the exponential growth we are seeing now

...would only be "exponential" if it continued. Previously in history, humanity experienced a short period of exponential growth in steam, plastics, atomic power. Where is that exponential innovation now?

Painting this as a trend for two billion years is the result of cherrypicking datapoints.

Kurzweil

Posted Jun 21, 2008 0:25 UTC (Sat) by wahern (subscriber, #37304) [Link]

I haven't read the book, but it seems to me that the point is that certain advancements in
computation are universal, and previous industries can be considered as serial advancements in
(or facilitators of) general computational capabilities. So it doesn't matter that
advancements in steam engines were finite; rather that subsequent advancements made possible
by steam engines, no matter the material industry, had the effect of in kind furthering
[exponential] growth in computational capabilities in general.

Theoretical computational advancement can continually progress as long as materials science,
no matter how disjoint, continually provides sufficient capabilities for the realization of
the next rung on the theoretical ladder.

The overall argument is not particularly persuasive, I agree, but not obviously fallacious.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 13:12 UTC (Thu) by dskoll (subscriber, #1630) [Link] (1 responses)

Yes, Kurzweil's book does have the tone of religious fervour.  I'm not saying I agree with
everything he says, just that the book was interesting.

I'd read it more as a science-fiction extrapolation that non-fiction.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 13:35 UTC (Thu) by nix (subscriber, #2304) [Link]

I find Vinge's original concept of the technological singularity to be much more interesting
than Kurzweil's accelerating-progress thing, which suffers from severe selection bias: had he
written that book in 1920, it would have been obvious that an avionics singularity was
approaching by, say, 1990. It didn't happen, because progress in single technological fields
follows S-shaped curves, not exponential ones.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jul 13, 2008 16:13 UTC (Sun) by aigarius (guest, #7329) [Link]

I like the narrative style of the (CreativeCommons licensed) Accelerando more.
http://www.accelerando.org/book/

Hmm... H-bar? What's the problem with it?

Posted Jun 19, 2008 5:15 UTC (Thu) by khim (subscriber, #9252) [Link] (5 responses)

Why to do you claim ℏ does not exist in HTML? Not all browsers support it, true, but most do. Or you can use ħ - it has slightly wider support...

Hmm... H-bar? What's the problem with it?

Posted Jun 19, 2008 17:18 UTC (Thu) by vaurora (subscriber, #38407) [Link] (4 responses)

Thanks for the pointer!  Due to the browser compatibility issues, I'll stick with
well-supported symbols.  HTML is not the ideal medium for physics.

Hmm... H-bar? What's the problem with it?

Posted Jun 19, 2008 21:13 UTC (Thu) by kraney (guest, #52619) [Link]

That's funny, given that HTML was invented by a physicist for the purpose of sharing physics 
papers. Everyone else is a squatter.

Hmm... H-bar? What's the problem with it?

Posted Jun 19, 2008 22:33 UTC (Thu) by leoc (guest, #39773) [Link] (1 responses)

Why not render the equations with tex as god intended.

Hmm... H-bar? What's the problem with it?

Posted Jun 26, 2008 16:52 UTC (Thu) by SEMW (guest, #52697) [Link]

I'll second the motion. If anyone's interested, my favourite way of embeding Tex in web pages is jsmath, which has excellent browser support and uses the proper TeX fonts if you have them installed (images and unicode fonts if you don't). MathML is the other option, but very few browsers support it at the moment (I think Opera 9.5 is the only one to fully support it out-of-the-box, though Firefox is on its way).

Hmm... H-bar? What's the problem with it?

Posted Jun 20, 2008 22:49 UTC (Fri) by dvdeug (subscriber, #10998) [Link]

The last browser that had any problem with that was Netscape 4. (No, lynx handles it just
fine.) Basic Latin characters used by major European languages (like Maltese) are supported by
everyone and have been for a long time.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 5:18 UTC (Thu) by pr1268 (subscriber, #24648) [Link]

Neat stuff, thanks for the article. I am fascinated by physics and quantum mechanics, and it's interesting and fun to ponder and analyze computational limits due to quantum effects.

On a slightly related topic, with regards to computing speed, Bjarne Stroustrup said it best when he exclaimed that the "next generation of Intel chips can do an infinite loop in five minutes!" :-D

Compression

Posted Jun 19, 2008 6:28 UTC (Thu) by smurf (subscriber, #17840) [Link] (2 responses)

Nice!

I've already invented the ultimate compression, or rather compression-avoidance, algorithm. It
involves a time machine which sends the file names of all the data that hasn't been accessed
at all in the last 100 years, back to the present.

Somebody else can please invent the time machine. Receiver only, for now; we've got another
100 years for the transmitter.

Not just the ultimate compression algorithm

Posted Jun 19, 2008 11:54 UTC (Thu) by emk (subscriber, #1128) [Link] (1 responses)

Such a time machine also allows you to send bits back in time. Let's say you want to know whether the Cubs win the World Series sometime between now and 2107. Simply create a file named "yes-they-won.txt". If the Chicago Cubs win, access that file. Presto! Your "compression" algorithm can now be used to tell whether or not the victory occurred / will occur.

If you want to place bets, create 100 files named "2008", "2009", "2010", etc. Then, at the beginning of each year the Cubs win the World Series, max out your credit cards and stake your entire fortune on the Cubs winning.

But this turns out to be an amazing waste of your "compression" algorithm. As it turns out, once you can send bits back in time, you can build a negative time delay element, a circuit component that shifts its inputs back in time. This, in turn, allows you solve NP-complete problems in polynomial time, as described in the Moravec article I just linked.

And if NP is equal to P, then you've built something much more powerful than a quantum computer. Specifically, you can answer virtually any question you can ask, provided the correctness of that answer can be checked in polynomial time. And this, in turn, has further surprising consequences:

But another reason we believe P≠NP is that otherwise mathematical creativity could be automated! You'd simply ask your computer: "Is there a proof of the Riemann hypothesis with at most a million symbols?" And if there is such a proof, the computer will find it in some way dramatically faster than searching through every possibility. If P=NP, you wouldn't merely have solved one of the seven million-dollar problems from the Clay Math Institute -- presumably you could also solve the other six.

(See also the section titled "Linearity" in another of Scott Aaronson's lectures, which links to a paper analyzing a number of related problems in physics.)

Not just the ultimate compression algorithm

Posted Jun 19, 2008 13:21 UTC (Thu) by alankila (guest, #47141) [Link]

Fascinating article. Thanks a lot for writing it.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 6:57 UTC (Thu) by AlexHudson (guest, #41828) [Link] (6 responses)

"entire computing capacity of the university" - while no doubt academic institutions carry a
great deal of computing capacity these days, I suspect that should read "universe"?

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 10:08 UTC (Thu) by NRArnot (subscriber, #3033) [Link]

Maybe, but Pratchett fans will be delighted if it's not a typo!

Typo

Posted Jun 19, 2008 11:05 UTC (Thu) by corbet (editor, #1) [Link] (4 responses)

What if it's a really big university?

Clearly a typo, I'm not quite sure how we missed it. Fixed now.

Accademic compression

Posted Jun 19, 2008 15:20 UTC (Thu) by utoddl (guest, #1232) [Link] (3 responses)

This nicely ties back to the compression discussion above, as the accademic compression technique can squeeze an entire semester's work down to a single letter and an optional "+" or "-" qualifier.

Academic compression

Posted Jun 20, 2008 14:40 UTC (Fri) by mtorni (guest, #3618) [Link] (2 responses)

If you drop the optional qualifier you can increase the (already lossy) compression and
achieve a notable compression with a tiny loss in useful resolution.

Academic compression

Posted Jun 23, 2008 17:17 UTC (Mon) by salimma (subscriber, #34460) [Link] (1 responses)

Yes; from 4 bits to 3 bits; a saving of 25% !

Academic compression

Posted Jun 26, 2008 8:27 UTC (Thu) by forthy (guest, #1525) [Link]

But most of the mark details are useless, all you require is a single bit "pass" or "fail". Nobody is going to read your detailed marks for each semester exam later in your career. For all reasonable degrees, it's basically a two-bit information: Dropout, Bachelor, Master, PhD. Don't think "dropout" is a career limiter: The world's richest man is a dropout.

I'm quite sure I'll see several limits to the exponential growth in my lifetime (probably the next 50 years), at least I already experience one (probably temporal) limit: clock frequency didn't go up much the last 5 years. There however is a way to get more power out of computers: write better software. It is amazing what vintage computer fans get out of ancient designs. We are used to write software for computers that get faster and have more memory, so we write a lot of slow, bloated software.

Due to the fact that single-threaded CPUs don't get (significantly) faster anymore, we probably should start right now: Use better algorithms to make single-threaded programs faster.

Storage increase in laptops:

Posted Jun 19, 2008 8:04 UTC (Thu) by deleteme (guest, #49633) [Link] (3 responses)

name: storage, released
TRS-80 Model 100: 8KB, 1983
Mini Linux Laptop: 2GB, 2008

That's 6 orders of magnitude in 25 years.. Not that I can buy a Model 100 for £99.

Storage increase in laptops:

Posted Jun 24, 2008 7:29 UTC (Tue) by ekj (guest, #1524) [Link] (2 responses)

The scary thing, or atleast the mindboggling one is trying to extrapolate this and imagine
what we'll be DOING with it.

Okay, so you say 6 OOM in 25 years. I think if you compare similarily-priced computers its
actually more than that. The TRS-80 cost a lot more (inflation-corrected) in 1983 than a 2GB
laptop cost today. I think the real number is more like 7OOM.

So, what does that mean if present trends continue for the NEXT 25 years ?

2GB * 10^7 ram.

It's a -gargantuan- number, it means your laptop by then will have more RAM than the
googlecluster has today. Infact it'll have RAM comparable to the sum total of ALL laptops in
USA today, give or take a OOM.

Storage increase in laptops:

Posted Jun 26, 2008 11:36 UTC (Thu) by Duncan (guest, #6647) [Link] (1 responses)

However, there's another dimension that you've failed to figure in, that 
of overall computer size.  It turns out that at least for the past 25-50 
years (the 50 years earlier measured, or the 25 here) at least, we've 
found the practical benefits of overall computer miniaturization 
beneficial as well, such that in practice they've absorbed some of those 
OOMs you mention.  It's the oft pointed out main-frame (room size) > 
mini-computer (large appliance size) > desktop (medium appliance size) > 
laptop (small appliance size) > handheld/umpc > cell-phone > watch... 
trend.  As storage and computation increases, we've chosen to trade off an 
OOM every decade or so to graduate to the next smaller sized unit.

If the per-decade size generation trend continues, normal people won't be 
using laptops anymore in 25 or even, really, 10 years, as we'll downshift 
a size or two or three instead.  Just as trends indicate people are now 
switching to laptops instead of desktops and UMPCs instead of laptops, 
because the smaller size now has more power than the larger size did a few 
years previously and it's all the power really needed at that usage point, 
a decade from now, computers the size (and likely cost as well) of today's 
remotes/MP3-players/cell-phones will be the norm (low/high-end), while 
packing the computing power of today's dual-quad-core servers.

OTOH, we're up against the wall of the human body's I/O limitations 
already, probably the reason we didn't migrate smaller several years ago, 
when the computing power of a desktop first exceeded that really necessary 
for office applications and the like.  Some people just like a full sized 
keyboard and a nice sized display, and that human interface is *NOT* 
shrinking with Moore's law, unfortunately.

For decades, Science Fiction's answer has been change the interface, voice 
recognition and eye-glasses displays, with direct neural tap interfaces 
predicted beyond that, but the required AI and materials science hasn't 
really made that first leap practical as yet, tho it /is/ tantalizingly 
close... but we've thought that for over a decade, as well.

Still, the keyboard and large external display as human I/O method is 
simply going to have to give if we're to graduate down below the UMPC 
level.  Or maybe we'll end up with ubiquitous built-in 
keyboard/display/Internet units everywhere, and plugin/wireless-in our 
multi-terabyte-storage-multi-cored-USB-thumb-drive-sized "personal 
computer" everywhere we go, much as folks are doing with the thumb-drives 
themselves today?

Or, just perhaps, the just-a-few-years long trend of actually shrinking 
cost will become the dominant factor going forward, and those now $400 
UMPCs like the Eee and friends will be $30-50 or even <$10, while 
containing the power and storage of today's big-drive quad-core 
desktop/servers, but with the permanent data stored "in the cloud" and 
with I/O to ubiquitous permanent displays/keyboards where needed as 
mentioned above, so the individual units become disposable, like the 
digital watches one can now buy in the dollar store.

I really do think that the average person's usage really is being met now, 
thus the focus on smaller but more important CHEAPER we are seeing, and 
that that fact is not going to change -- UNLESS some "killer app" like 
truly practical general purpose (not limited purpose/vocab as we see now) 
voice recognition and hidef spectacles displays suddenly appear.  
If /that/ happens, then we'll see the drive to smaller (but with whatever 
resources are necessary to drive the voice recognition AI) reassert itself 
over cheaper, down to the point they can be embedded in the eye-glasses 
themselves.  Since such a practical general purpose voice recognition AI, 
should it appear, is likely to be fairly resource intensive for even 
today's multi-core desktops, the process of miniaturizing the hardware 
(and power requirements) to the watch-battery size point, for embedding in  
those spectacles, is going to take a fair bit of that 25 years, anyway.  
Beyond that... well, we'll just have to see where that 
neuromanceresqe "jacking in" tech is, at that point.

Duncan

Storage increase in laptops:

Posted Jun 26, 2008 13:44 UTC (Thu) by ekj (guest, #1524) [Link]

That's true. You get a 7oom quicker computer for $5000 now, compared to a similar sum of money
(inflation discounted) 25 years ago.

But people don't BUY those. They buy $500 - $1000 computers instead, and spend some extra on
getting small rather than powerful at that. (laptop-hds are much more expensive pro GB than
desktop-hds)

It's already all in the IO. I've got one 17" laptop and one 12", the difference isn't in the
power (it's there, but I rarely care) but in the fact that the 17" is just better if I need a
lot of screen pixels or a lot of physical screen-size.

Obligatory...

Posted Jun 19, 2008 8:18 UTC (Thu) by yodermk (guest, #3803) [Link]

10^31 bits should be enough for anyone ...


Interesting stuff, thanks!

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 8:25 UTC (Thu) by eskild (guest, #1556) [Link]

Lovely! It was nice to get my head out of obscure bugs in obscure networking protocols for a
while.

Power usage?

Posted Jun 19, 2008 8:37 UTC (Thu) by epa (subscriber, #39769) [Link] (3 responses)

As you mentioned, these numbers don't take any account of power usage or heat dissipation.  It
would be interesting to know the theoretical limits of a one kilogram, one litre computer
drawing one watt of power.

Power usage?

Posted Jun 19, 2008 12:02 UTC (Thu) by emk (subscriber, #1128) [Link] (2 responses)

Basically, if you're worried about power and heat, you want to know the fundamental limits of reversible computation. A reversible computation is, of course, a reversible machine, and therefore uses no energy (except for output bits). Richard Feynman actually wrote a nice essay on this problem. And as it turns out, you can still do an insane amount of computation per unit time, under the laws of physics.

Of course, it's probably impossible to actually construct a true reversible machine, although certain small-scale quantum phenomenon, such as superconductivity, come surprisingly close.

Power usage?

Posted Jun 20, 2008 8:39 UTC (Fri) by epa (subscriber, #39769) [Link] (1 responses)

I was just thinking of normal, non-reversible computation, constrained to dissipate at most
one watt of heat.

Do any reversible computing devices currently exist?  (Yes, I know a NOT gate exists, but I
mean something computationally more powerful than that, preferably Turing-machine-equivalent,
and specially designed to use the properties of reversible computation to minimize power
consumption.)

Power usage?

Posted Jun 23, 2008 6:00 UTC (Mon) by jzbiciak (guest, #5246) [Link]

There sure are. One of the papers apparently describes a simple 8-bit reversible CPU.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 8:50 UTC (Thu) by __alex (guest, #38036) [Link]

"Of course, none of these calculations take into account power delivery or I/O bandwidth,
which 
may well turn out to be far more important limits on computation."

I was under the impression that both of these things pretty much already are the biggest
limits on 
computation?

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 9:45 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

Really glad we're having more of these Val Henson articles.  She always has something
fascinating to write about.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 10:07 UTC (Thu) by produit (subscriber, #35640) [Link] (1 responses)

Your number of micron cube in the ultimate laptop is wrong.
There are 10^18 micron cube in a m^3 so 10^15 in the laptop.
So the memory you can store in a grain of dust is not so impressive

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 17:16 UTC (Thu) by vaurora (subscriber, #38407) [Link]

Yes, that's an error - thank you for catching it!  The correction is on the way.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 10:44 UTC (Thu) by enodev (guest, #14913) [Link] (1 responses)

Why doesn't the author take the volume of a modern cpu or hard disk into account when
calculating the storage/calculation power of a current laptop? I mean the data storing volume
of a 2.5" hard disk platter is about 
(3.14*6cm^2*50um) = 6e-4 dm^2 ~= 1e-3 dm^3. So only 16 orders of magnitude to go... . Same
applies for the volume of a cpu die. Or do I miss something?

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 17:22 UTC (Thu) by vaurora (subscriber, #38407) [Link]

In the original article, the author makes it much more clear that we are assuming nothing
about what the technology of computers in the future looks like.  In particular, we're not
assuming that the storage of the computer is compartmentalized into a small box containing
spinning rust-covered disks. :) All the calculations are based solely on the mass and volume
of the computer, with no assumptions about distribution.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 13:32 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

The ultimate laptop looks like a thermonuclear explosion? If only! There's lots of actual
organized matter in that (e.g. it's not all monatomic hydrogen plasma so there's order in the
nuclei).

It's more like the aftermath of a total annihilation reaction: a blast of elementary particles
and wide-spectrum (mostly high-energy) radiation, all, of course, precisely positioned so that
it interacts neatly.

The engineering challenges involved in building this thing are significant :) and how you do
I/O I have not the least idea. I imagine you'd need smaller computers to mediate between the
ultimate laptop and anything else. This is post-Transcend stuff ;}

(Oh, and, er, why does this need to be on a kernel hacker's bookshelf exactly? I know the life
of a Linux kernel hacker is an exciting and never-ending whirl, but I didn't think it was
*this* exciting.)

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 13:40 UTC (Thu) by nix (subscriber, #2304) [Link]

As an aside, if you want to consider even higher physical limits by making the computer as
dense as you possibly can in order to push up its mass and energy density, you start getting
into really interesting and bizarre stuff applying to black-hole-density computational
devices, like the holographic principle and the Bekenstein bound.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 17:02 UTC (Thu) by bobort (guest, #5019) [Link] (1 responses)

The number of SBOPs per FLOP has not been constant over time, and it isn't really obvious how
that's going to change in the future.  Remember you have to account for all of the control
infrastructure, which dwarfs the arithmetical logic on a modern CPU - thousands of
multiplexers, comparators, buffers, lookup tables, signal repeaters, etc. just to get the data
from memory to the ALU and back - and the trip gets farther every year.  I suspect there are
more like 100k SBOPs per FLOP in a typical CPU, but that's a guess.

Will things get more or less "efficient" over time?  It's very hard to say, there are powerful
forces pulling in both directions.  I'd say the biggest undetermined aspect is whether we can
weasel around Amdahl's law, or how much.  How parallelizable will the software running on the
ultimate laptop be?  If everything ends up being highly parallelized, microarchitecture is
likely to evolve towards simplicity and SBOPs per FLOP will probably stay roughly the same or
even come down a bit.  If nobody comes up with a way around Amdahl (which is a rigorous
theorem unlike Moore's "law"), single-thread performance will need to be continually
increased, and that will increase SBOPs per FLOP over time.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 23, 2008 6:05 UTC (Mon) by jzbiciak (guest, #5246) [Link]

I imagine an ultimate laptop would look more like a dataflow machine of some sort where
functional units are connected directly to neighboring functional units rather than time
multiplexing instructions through a smaller set of functional units with all the overhead you
describe.

After all, with that many compute elements, why wouldn't you?  I imagine data storage would
largely return to a delay element model too, just for its compactness.  Since compute elements
would be in contact with all the data elements at any given time, you could easily access all
of your data in parallel and do *something* with it, even if it's just moving it along.  The
sheer dynamics of such a dense system would mean that the data has to keep moving though.

gravitational constant is G not g

Posted Jun 20, 2008 3:27 UTC (Fri) by stuart_hc (guest, #9737) [Link]

A small point, but the gravitational constant by convention is written with a capital G not a lowercase g.

1000 single bit operations per FLOP

Posted Jun 20, 2008 17:42 UTC (Fri) by mikov (guest, #33179) [Link] (4 responses)

I wonder how accurate the 1000 single-bit operations estimation is. Mind you, I am not a
electronic design expert, so this is just rambling. 

I've been thinking that a 1-bit adder has three inputs, so it has a 8-entry truth table. (Each
entry looks like something like "o=i1 & ~i2 & i3").
After optimizations, I am counting 18 single bit operations ("ands" and "nots"). It has two
outputs, I guess that is just double the whole thing  to 36.

So for a 32-bit integer adder we have 32*36=1152 single-bit operations operations. The
straight-forward multiplier, which they thought us in school, will do an addition per set bit,
so  if we assume 16 set bits on average, we get 18432 single bit ops, ignoring the shifts.

So, when using the most simplistic algorithms, the 1000 operations number seems far too
optimistic. 

Can someone do a similar estimation for the super-clever algorithms they actually use the CPUs
? (There is a detailed description in "Computer Architecture: A Quantitative Approach", but I
never understood it well enough to actually keep it in my head, and the book is not with me.
Oh yes, and I am lazy)

1000 single bit operations per FLOP

Posted Jun 23, 2008 6:56 UTC (Mon) by jzbiciak (guest, #5246) [Link]

First, let's assume that each computation is done in the minimum number of steps possible. Many of the transistors spent in functional units today are spent on making a single copy of the operation go faster. When computing at an atomic scale, this probably no longer makes sense. It probably makes more sense to make each operation as simple and economical as possible so you can put down as many copies as you like precisely where they're needed.

With volume computing you'll have so much more connectivity to neighbors that economical implementations would likely win out. For instance, a carry-select adder, which computes two versions of the result speculatively and throws away one, wouldn't make sense since it's much, much larger than a ripple-carry adder. (IIRC, a state of the art 32-bit adder is around 3000 transistors, whereas a 32-bit ripple carry adder should be around 700 or so.)

A 1 bit full-adder—3 inputs, 2 outputs—consists of two pieces: The sum computation, which is two two-input XOR gates, and the carry computation, which is 3 two-input AND gates and 2 two-input OR gates. The logic equations are:

sum_out = a XOR b XOR carry_in
carry_out = (a AND b) OR (a AND c) OR (b AND c)

If you assume a ripple carry implementation, then that's it. 7 operations per bit if you allow the full complement of boolean operations. So, a 32-bit add will be a mere 224 boolean operations. Even if you penalize XOR and count them as 3 bit operations each, that's still only 352 boolean operations. Of course, IEEE-754 floating point isn't built around 32-bit adds. You go through the following major steps for a single-precision floating point addition:

(For now I assume a parallel implementation since it doesn't really affect the number of SBOPs. Also, I assume the more conservative 11 SBOP number for a single bit full adder.)

Subtract the mantissas to get the "alignment difference" between the two numbers. This is an 8 bit subtract, IIRC, so would cost 88 SBOPs.
Shift the numbers to align them. To be conservative, I'll say this comprises the following steps in a simplistic implementation:

Swap the two numbers based on the sign of the difference of the magnitudes. This requires 24 2:1 muxes, and each 2:1 mux costs 3 SBOPs, for a cost of 2 × 24 × 3 = 144 SBOPs.
Shifting the smaller magnitude number to the right by up to 24 positions. If you structure this as 5 layers of 2:1 muxes (which is overkill, since some of the inputs will be zero and so this could be optimized), you get 5 × 24 × 3 = 360 SBOPs.

Apply the sign to both numbers. IEEE-754 is stored in sign-magnitude form. To apply the sign, you merely need to XOR the number with the sign and inject an appropriate carry when you do the addition. Thus, we can estimate this as 48 XORs, which, if you count XOR as 3 SBOPs, is 144 SBOPs.
Actually add the two 24 bit mantissas. If we go with 11 SBOPs per bit, this is 264 SBOPs.
Strip the sign from the result. Again, you could simply XOR with the result's sign bit (which should be the carry out of the last adder). 3 × 24 = 72 SBOPs.
Count the number of leading zeros so that we can renormalize the result. I don't know off the top of my head how many SBOPs this ought to take.
Shift the final result to normalize it. Conservatively, let's use the crummy shifter above which is 5 × 24 × 3 = 360 SBOPs
Update the mantissa. This is another 8-bit add. 8 × 11 = 88 SBOPs.

So where does that put us in this really basic, if perhaps naive implementation? Around 1520 SBOPs plus the cost of the priority encoder used for normalization.

Sure, we haven't handled overflow, underflow, denormals, NaNs and so on. But, I also have assumed clumsy implementations for some of the more expensive bits, such as the shifters. If the shifters were half the cost, for example, the total drops to 1160. Overall, I'd say this quick and dirty analysis validates that Val's SWAG (Scientific Wild Ass Guess) is in the right ball park, and certainly well within an order of magnitude. Since nearly all the rest of the numbers are expressed in orders of magnitude, being off by 50% ain't so bad. The log10 of 1.5 is pretty small.

1000 single bit operations per FLOP

Posted Jun 23, 2008 7:07 UTC (Mon) by jzbiciak (guest, #5246) [Link] (2 responses)

Regarding the multiplier... It turns out that floating point multipliers can get rid of most of the alignment steps that the adder needs to do. (See my other post.) You also don't need to apply the sign to your input or your output. So, you're just left with the addition steps.

A naive 24 × 24 multiplier would effectively do 24 24-bit adds. That's 6336 SBOPs. Assume for a moment, though, that we can eliminate about half of those because only 24 output bits are needed. This brings us down to 3168 SBOPs for the multiplier. Remaining steps:

Adding the mantissas: 88 SBOPs.
Aligning the result (shift by up to 3 positions, IIRC, implemented as 2 levels of 2:1 mux): 2 × 24 × 3 = 144 SBOPs

This gives a total of ~3400 SBOPs. About 2× to 3× the cost of an adder.

Oh, and that reminds me, you can eliminate one of the argument inversions in my adder estimate above. That squeezes another 60-70 SBOPs out. (You do add some other bit inversions here and there on the sign bits, which is why you don't get all 72 back.)

1000 single bit operations per FLOP

Posted Jun 24, 2008 22:49 UTC (Tue) by nix (subscriber, #2304) [Link]

That came off the top of your head?!

Allow me to briefly say 'wow', if so. :)

1000 single bit operations per FLOP

Posted Jun 25, 2008 0:06 UTC (Wed) by mikov (guest, #33179) [Link]

Wow, thanks! I have nothing to say except to retract my idiotic suggestion of implementing the
adder with a truth table :-)

I have to agree with "nix" that this is very impressive. Your post deserves an article of its
own.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 23, 2008 8:47 UTC (Mon) by ekj (guest, #1524) [Link]

We're closer than this to universal limits on computation, if it's not reversible computing.

With non-reversible computing, flipping a single bit, as you say, costs energy. There's a
minimum energy it MUST cost, dependant on operating-temperature.

So, how much can a laptop do if it's only allowed to consume 10W, operate non-reversibly, and
operate at room-temperature ? I did the math for rec.arts.sf.science some time back, and may
very well have done it wrong. But I got as a result that we're only ~5 OOM away from the
theoretical limits.

Factoring in I/O

Posted Jul 1, 2008 23:08 UTC (Tue) by jd (guest, #26381) [Link]

I/O is a relatively simple thing to factor in, provided we assume something analogous to the
Transputer, which communicated to "adjacent" nodes, where "adjacent" would either mean the
physically adjacent Transputers or the ones that would be logically adjacent if you used a
hypercube topology. The latter was generally faster, but here we don't have wires and can't
organize the atoms that way.

Basically, you need to consider a cluster of atoms, such that the cluster supports your
logical operation plus the ability to receive input and the ability to deliver output. The
Transputer used four serial connectors. Could we have more than that, here? Well, that depends
on the maximum density you could pack the atoms and the arrangement they took. The tetrahedron
is a nice, packed structure, and if I remember correctly, gives you 6 bonds per atom. (One
bond to the other three atoms in the tetrahedron, plus one bond to each of the three adjacent
tetrahedrons.) Each bond is capable of conveying data, so each bond can be regarded as a
connector. As with the Transputer, data can flow in either direction but can only flow in one
direction at a given time.

So, your processing atom has 6 lines for I/O. Assuming it can only handle two input lines and
one output line (a very basic gate), you have sufficient lines to process at double the I/O
bandwidth if necessary without running out of data. Assuming an I/O transaction to be simply
the delivery of information to a processing atom used for I/O, a gate operation, and then the
delivery of the result, each I/O operation consumes the time for a gate transaction plus twice
the time it takes for an electron to travel the distance between atoms in this structure.

(Not sending to a specific atom is equal to ANDing with zero, so this drives the heat output
up considerably. We are assuming gate transactions are based on instantaneous input values and
do not need to be actually measured, per se. Otherwise, we need to add in the time for two
independent states to stabilize and be measured, where the two state changes do not take place
simultaneously from the perspective of the observing atom. IIRC, the correct value to use is
1.5x the time it takes to stabilize for a single state.)

So, the total time for a local transaction is now the time for three gate transactions, plus
the time for four electron hops, plus optionally 3x the time it takes for a state to
stabilize. An electron hop is limited by the speed of light, but the exact distance depends on
the forces involved. For simplicity, I will disregard it, but if you want a more accurate
value, I suggest picking the carbon atom and maybe diamond as the structure, then figure out
the correct values from that. The other values are in the article.

However, transactions are multicast. Each I/O atom can deliver to anywhere between zero and
two target processing atoms (assuming it cannot deliver to the originating atom), plus zero to
three other I/O atoms, which can in turn cascade the data to processing atoms and other I/O
atoms. The exact number depends on what happens to be adjacent.

This means a signal can go between any given processing atom and any given set of processing
atoms (excluding any set including itself). We use the per-hop calculation to calculate the
average time for an actual transaction by calculating the average volume you would need to
transmit over. The maximum time is the time for a signal to sweep the entire structure. The
maximum time assuming any-to-any communication will be for the signal to sweep half the
distance.

Because 2/3 of the atoms are used for I/O, and because it takes quite some time for data to
reach a given processing atom, the maximum theoretical speed should be reduced accordingly.

This has assumed we are using electrons to convey data, but we don't have to. An electron, at
suitable speed, striking a nucleus, will release an X-Ray. An X-Ray, at suitable energy,
striking a nucleus, will release an electron. This is known as X-Ray fluorescence and is a
widely-used technique.

This reduces (but does not eliminate) the need to use processing atoms as I/O switches. If
there's line-of-sight, it's possible for an X-Ray to deliver data to a target cluster of
atoms. You would use the cluster to absorb the X-Rays and then deliver the data as electrons
to the processing atom. The processing atom would need to (somehow) fire an electron fast
enough to convert the result back into an X-Ray.

This I/O is point-to-point, not multicast, but has significantly less latency, allowing for
far larger "neighbourhoods".

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jul 3, 2008 18:03 UTC (Thu) by mcortese (guest, #52099) [Link]

So what? It only proves the limits of a 1-kg block of matter, assuming that it will be a reasonable piece of hardware for a laptop.

But by the time we reach the storage limit (to take one), we may have a different concept of what a "laptop" is. For example it could be the union of a 1-kg device with no storage at all, linked to a physically large storage located somewhere far away, with a terabitpersecond wireless connection.

What Dr. Lloyd calls his "Ultimate Laptop", is indeed bound to the current understanding of what a laptop is, which reduces his work back to a mere discussion of the applicability of Moore's Law to a particular manufacturing assumption.

Whether you believe or not that humankind will always find ways to perpetuate Moore's Law through a shift to new techniques, for sure it won't be this paper to give you any proof.

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Aug 12, 2008 9:32 UTC (Tue) by muwlgr (guest, #35359) [Link]

A pair of nanotech/reverscomp articles that amazed me some years ago :
http://www.cise.ufl.edu/research/revcomp/physlim/PhysLim-...
http://www.zyvex.com/nanotech/mechano.html (like, Babbage engine 
returns!:>)