The fabs propped up the corpse of Moore's Law by throwing mountains of cash at expanding transistors into the third dimension: finFET, GAA, CFET, etc. That has kept the party going a little while longer than it would have lasted but it's a one-time deal since are no more dimensions to expand into.
…but that’s how it’s always worked. Moore’a law is dead, we’re at the limit of everything, oh hey, Moore’s lawn limps by again because someone did something clever.
This is the kind of comment that will keep me laughing for weeks. Moore's lawn is in fact dead. We need to go back to calling what Moore said as his observational insight.
That's never how it worked. Moore's law was never dead. People are just endlessly confused about what Moore's law is.
What ended was Dennard scaling around 2006. Roughly that frequency would keep going up as feature size went down. But because so many people are confused about what is what, you see a crappy muddled message.
Moore's law has been going strong. It must end eventually, current predictions are that it will be in a decade or two.
It's starting to get a bit old that whenever I see Moore's law mentioned, I'll usually also run into a spiel about how people have the wrong idea about what it actually refers to, and that it's holding up just fine. This is despite the gen-on-gen and year-on-year performance improvements of computer hardware very clearly tapering off in recent memory.
Maybe debating what always-somehow-wrong law to cite should not be the focus? Like it's very clear to me that being technically correct about what Moore's law or the Dennard scaling refers to is leaps and bounds less important than the actual, practical computing performance trends that have been observable in the market.
What we see in the market is caused by software bloat. Chips are gaining performance faster than ever in absolute terms.
I think Moore’s law should be avoided altogether when discussing progress in this area, because it’s hard to understand the effects of doubling intuitively. Rice grains on chessboards and all that.
One might think ”Moore’s law is slowing down” means progress was faster before and slower now, when it is in fact completely opposite.
If you consider the 20 years between the intel 286 and the pentium 3, transistor count went from about 150 thousand to 10 million.
Today (using the ryzen 5950 and 7950 as examples), we got 5 Billion more transistors in just 2 years.
So in 2 years we added 500 times more transistors to our cpus than the first 20 years of “prime Moore’s law” did.
This enormous acceleration of progress is increasingly unnoticed due to even faster increases in software bloat, and the fact that most users aren’t doing things with their computers where they can notice any improvements in performance.
> Chips are gaining performance faster than ever in absolute terms.
But this is not what I as a consumer end up seeing at all. Consider the RTX 5090. Gen-on-gen (so, compared to the 4090), for 20-30% more money, using 20-30% more power, you get 20-30% more raster performance. Meaning the generational improvement is 0, software nonwithstanding.
> If you consider the 20 years between the intel 286 and the pentium 3, transistor count went from about 150 thousand to 10 million. Today (using the ryzen 5950 and 7950 as examples), we got 5 Billion more transistors in just 2 years.
Why would you bring absolute values into comparison with a relative value? Why compare the 286 and the P3 and span 20 years when you can match the 2 year timespan of your Ryzen comparison, and pit the P2 ('97) against the P3 ('99) instead? Mind you, that would reveal a generational improvement of 7.5M -> 28M transistors, a relative difference of +273%! Those Ryzens went from 8.3B to 13.2B, a +59% difference. But even this is misleading, because we're not considering die area or any other parameter.
>But this is not what I as a consumer end up seeing at all. Consider the RTX 5090. Gen-on-gen (so, compared to the 4090), for 20-30% more money, using 20-30% more power, you get 20-30% more raster performance. Meaning the generational improvement is 0, software nonwithstanding.
The 4090 and 5090 are the same generation in reality, using the same process node. The 5090 is a bit larger but only has about 20% more transistors of the same type compared to the 4090. Which of course explains the modest performance boosts.
Nvidia could have made the 5090 on a more advanced node but they are in a market position where they can keep making the best products on an older (cheaper) node this time.
>Why would you bring absolute values into comparison with a relative value? Why compare the 286 and the P3 and span 20 years when you can match the 2 year timespan of your Ryzen comparison, and pit the P2 ('97) against the P3 ('99) instead? Mind you, that would reveal a generational improvement of 7.5M -> 28M transistors, a relative difference of +273%!
That was my point though, to highlight how relative differences in percentages represent vastly different actual performance jumps. It quickly becomes meaningless since it's not the percentages that matter, it is the actual number of transistors.
To put it another way - If you take the first pentium with about 3 million transistors as a baseline, you can express performance increases in "how many pentiums are we adding" instead of using percentages, and note that we are adding orders of magnitude more "pentiums of performance" per generation now than we did 10 years ago.
> That was my point though, to highlight how relative differences in percentages represent vastly different actual performance jumps. It quickly becomes meaningless since it's not the percentages that matter, it is the actual number of transistors.
But it is the percentage that matters? If I have 10B transistors and I add 1B to it, the speedup I can expect is 10%. If I have 1B transistors and add 1B to it, the speedup I can expect is 100%. Tremendous difference. Why would I ever care about the absolute number of transistors added?
>But it is the percentage that matters? If I have 10B transistors and I add 1B to it, the speedup I can expect is 10%. If I have 1B transistors and add 1B to it, the speedup I can expect is 100%. Tremendous difference. Why would I ever care about the absolute number of transistors added?
Because it's the transistors that actually matter for performance.
If a unit of work (number of files compiled, number of triangles generated or whatever else) takes 1 billion transistors to complete in 1 second, you have gained the same amount of work per second by adding 10% to the latter as you gained by adding 100% to the former.
How much performance you need to feel a difference in a given workload is a separate point, and note that usually the workload changes with the hardware. Compiling the linux kernel in 2025 is a different workload than it was in 2005, for example, and running quake 3 is a different workload than cyberpunk 2077.
If you play a modern game, you don't notice a difference between a 10 year old GPU and a 12 year old GPU - even though one might be twice as fast, they might both be in the single digits of FPS wich feels equally useless.
So we gain more from the hardware than we used to, but since the software is doing more work we're not noticing it as much.
I legitimately just do not see the utility of framing the topic this way.
Benchmarking efforts usually involve taking the same amount or type of work, and comparing the runtime durations or throughputs in turn for different hardware.
Rasterization performance benchmarks for the 5090 revealed exactly the same +20% difference we see in transistor count. This is why I do not see the utility in remarking that in absolute terms we're adding more transistors than ever, because this is basically never what matters in practice. I have a set workload and I want it to go some amount faster.
Software sprawl is an issue no doubt, but that on its own is a separate discussion. It bears light relation with the absolute vs. relative differences discussion we're having here.
> How much performance you need to feel a difference in a given workload is a separate point
It was exactly the point I said at the start should be the point of focus. Maybe we're talking past one another, I don't know.
>It was exactly the point I said at the start should be the point of focus. Maybe we're talking past one another, I don't know.
I think we do - we agree that the symptom is that we don't experience the same gains now as we used to, and that is a problem.
My issue is the notion that this is caused by a slowdown in performance gains from the hardware side, when this is clearly not the case. A common complaint is along the lines of "we only got 30% when last time we got 50%", which completely ignores that the latter 30% is way more actual new performance than the previous 50%.
>I legitimately just do not see the utility of framing the topic this way.
IMO it's always useful to identify the actual reason for a problem and think about the fundamentals.
If the problem is that we're not experiencing the performance gains, we should be asking ourselves "Why does software feel slower today despite the hardware having 10x more performance".
Instead we complain about the hardware for not managing to add the equivalent of all previous performance gains every 2 years, because Moore's law observed that it did so in the beginning of the chessboard (so to speak).
Instead of wondering whether Moore's law is dying or not, we should question why Wirth's law seems to be immortal! ;)
You seem to be completely confused about why absolute vs relative matters. Moore's law literally states: "[Moore's law] is the observation that the number of transistors in an integrated circuit (IC) doubles about every two years".
This is literally a relative measurement. You cannot reason about Moore's Law in absolute changes.
The other poster has laid it out for you in the simplest terms: a 1 billion transistor increase could mean anything. It could be a 1000% improvement - which is absolutely humongous - or a 10% improvement, which is basically irrelevant. If you want to measure how impactful an increase is, you have to look at relative change. 1 billion transistors on it's own means nothing. It is only interesting with respect to the number of transistors in the previous generation - which is a relative change measurement.
Say we are at Generation 1 with 100 billion transistors. By your reasoning, if we add 1 more billion of transistors to this, that's big. 1 billion transistors are a lot. But this is absolutely incorrect! Because we started out with 100 billion transistors, the change is actually irrelevant.
>You seem to be completely confused about why absolute vs relative matters.
[..]
>This is literally a relative measurement. You cannot reason about Moore's Law in absolute changes.
This is exactly my point. I said exactly the same thing as you: "I think Moore’s law should be avoided altogether when discussing progress in this area"!
Performance is an absolute thing, not a relative thing. Amount of work per second, not percentage-over-previous!
Doubling means that each step encompasses the sum of all previous steps. Every step gives vastly more performance than any preceding step.
If a generation gives 60% more performance and all previous generations gave 100%, the latest jump will still be the one that added the most performance.
I think this is often overlooked and worth mentioning. People are actually complaining about performance jumps that are the biggest performance jumps ever seen because they are thinking in percentages. It makes no sense.
I disagree on this: "Performance is an absolute thing"
I think the core misunderstanding is what is being discussed. You say "Performance is an absolute thing, not a relative thing. Amount of work per second, not percentage-over-previous!"
But nobody here is concerned with the absolute performance number. Absolute performance is only about what we can do right now with what we have. The discussion is about the performance improvements and how they evolve over time.
Performance improvements are inherently a relative metric. The entire industry, society and economy - everything is built on geometric growth expectations. You cannot then go back to the root metric and say "ah but what really matters is that we can now do X TFlops more", when that number is a mere 10% improvement, nobody will care.
Built into everything around computing is the potential for geometric growth. Startups, AI, video games - it's all predicated on the expectation that our capabilities will grow in geometric fashion. The only way to benchmark geometric growth is by using % improvements. Again, it is wholly irrelevant to everyone that the 5090 can do 20 tflops more than the 4090. 20 TFlops is some number, but whether it is a big improvement or not - ie is it expanding our capabilities at the expected geometric rate or not - is not possible to divine from this number.
And when we look it up, it turns up that we went from 80 tflops to 100 tflops, which is a paltry 25% increase, well short of the expectations of performance growth from Nvidia which was in the 50% region.
20 tflops might or might not be a lot. The only way to judge is to compare how much of an improvement it is relative to what we could have before.
>I think the core misunderstanding is what is being discussed.
I see your point and it's valid for sure, and certainly the prevailing one, but I think it's worth thinking about the absolute values from time to time even in this context.
In other contexts it's more obvious, let's use video.
Going from full HD to 4K is roughly a doubling of resolution in each direction. It means going from about 2 megapixels to 8, a difference of 6.
The next step up is 8k, which is going from 8 megapixels to 33. A difference of 25.
This is a huge jump, and crosses a relevant threshold - many cameras can't even capture 33mpx, and many projectors can't display them.
If you want to capture 8k video, it's not relevant if your camera has twice the megapixel count of the previous one - it needs to have 33mpx. If it has 16 instead of 8 it doesn't matter (also it doesn't really matter if it has 100 instead of 40).
On the other hand, if you want to capture 4k, you need only 8 megapixels. If you have 12, 16 24 or 40 doesn't really matter.
If we return to CPUs, absolute performance matters there too - if you want to play back video without frame drops, you need a certain absolute performance. If your computer is twice as fast and still can't do it, the doubling didn't matter. Similarly if your current computer can do it, doubling it again won't be noticeable.
Take density per mm^2 going from 122.9 to 125.3 that is 2% increase in little over 2 years. Which does not bode well for needing to double in same period.
The spirit of Moore’s law is alive and well. It’s just that it doesn’t cover the whole story; it ignores efficiency, because that wasn’t a huge concern back then.
Sure, it’s not literally true of literal transistor density, but it feels arrogant to invent new names for the overall phenomenon that Moore observed. Like the way we still talk about Newton’s laws despite very valid “actually technically” incompleteness.
It is ultimately a market effect, the technical specifics are not really important and are even conflated by industry insiders. See my sibling comment.
Also, chip fabs keep getting more expensive and taping out a chip design for those new fabs keeps getting more expensive. That makes today's semiconductor industry work quite differently from how things worked 30 years ago and means some segments see reduced or delayed benefits from the continued progression of Moore's Law. See eg. the surprisingly long-lasting commercial relevance of 28nm, 14nm, and 7nm nodes, all of which were used in leading products like desktop GPUs and CPUs for more years than Moore's Law would lead you to expect.
Some view it as a doubling every 18 months, or a cost per transistor (this has gone up with the smallest nodes).
It is roughly an exponential curve in the number of transistors we can use to make a "thing" with.
It is both a capability (can we make things of a certain number of transistors) and is it economically viable to build things of that size.
You could stay at the current node size and halve the cost of that wafer every 18 months and you would still be on the same curve. But it is easier in a our economic system to decrease the node size, keeping the rest of the fixed wafer costs the same and get 2x or 4x the density on the same lines.
If I get nerd sniped, I'd find the two video presentations one by Krste and another by Jim Keller where they unambiguously explain Dennard Scaling and Moore's Law in a way that is congruent with what I just said.
> Moore's law ends when the whole universe is a computer (which it already is).
I find "Moore's Second Law" interesting. At least the version I'm familiar with says that the cost of a semiconductor chip fabrication plant doubles every four years. See https://en.wikipedia.org/wiki/Moore%27s_second_law
It's interesting to contrast that trajectory with global GDP. At some point, either global economic growth has to accelerate dramatically to even produce one fab; or we have to leave the 'globe', ie we go into space (but that's still universal GDP exploding), or that law has to break down.
It would be exceedingly funny (to me), if the one of the first two possibilities held true, and would accurately predict either an AI singularity or some Golden space age.
Does this actually work? At some point, and this is been the case for a while, you're limited by thermals. You can't stack more layers without adding more cooling.
He's talking about how they've moved from planar transistors, where layers are just deposited on top of each other, to transisors with increasingly complex 3D structures[1] such as FinFET and Gate-All-Around, the latter having multiple nanowires passing through the gate like an ordered marble cake[2].
There are also cooling and conduction paths taken into account. It was discussed in the design of the xeon version of the i9. Which had me consider clocking down the performance core communication while throttling up the performance cores.
Your sources are excellent. ( Thank you so much for the links. )
Moore's law doesn't say anything about you having to power all your transistors for them to count.
I'm only half-joking: the brain gets a lot of its energy efficiency out of most of its parts not working all that hard most of the time; and we are seeing some glimpses of that in mobile processors, too.
Assuming those extra dimensions really exist (it is unproven), I think we are centuries or even millennia away from being able to make technological use of them-if we ever will be at all
Expand into the time dimension, evaluate infinite execution paths by reversing the pipeline, rerunning, and storing the results in a future accumulator.
Apart from the very niche application of factoring integers into prime numbers, there's scarcely any application know where quantum computers would even theoretically outperform classical computers. And even integer factoring is only remotely useful, until people completely switch away from cryptography that's relies on it.
The one useful application of quantum computing that I know of is: simulating quantum systems. That's less useless than it sounds: a quantum computer can simulate not just itself (trivially), but also other quantum systems. In any case, the real world use case for that is for accelerating progress in material science, not something you nor me would use everyday.
This isn’t an elaboration on anything I said. Quantum computers are immensely useful across a whole slew of domains. Not just cryptanalysis, but also secure encryption links, chemistry simulations, weather predictions, machine learning, search, finance, logistics, classical simulations (e.g. fluid flow) and basically anywhere you have linear algebra or NP problems.
Do you have any sources that give good evidence that quantum computers are useful for 'weather predictions, machine learning, search, finance, logistics, classical simulations (e.g. fluid flow) and basically anywhere you have linear algebra or NP problems'? I'm basing my skepticism mostly on the likes of Scoot Aaronson.
I can believe that quantum computers might be useful for chemistry simulations (Quantum computers aren't really useful for encryption. But you could theoretically use them. They just don't really give you any advantage over running a quantum resistant algorithm on a classic computer.)
I'm especially doubtful that quantum computer would be useful for arbitrary NP problems or even arbitrary linear algebra problems.
There are specialized algorithms for any part of it, especially search. But demonstrating that quantum computers are good for linear algebra should be enough to show that they are generally useful, I hope.
The encryption I was referring to was quantum link encryption (technically not a quantum computer, but we are splitting hairs here; it uses the same set of underlying mechanisms). Quantum link encryption permits you to have a communications channel that if someone tries to man in the middle, all it does is break the link. Both you and the attacker only see gibberish. It’s like a one time pad that doesn’t require first exchanging pads.
Scott to be a bit of a Debbie Downer on quantum timelines, and you need to be able to separate that from actual criticisms of the underlying technology. If you look past the way he phrased it, his criticisms of quantum machine learning basically boil down to: there are still some things to be worked out. Not that we have no expectation on how to work those things out, just that there are still unsolved challenges to be tackled.
That’s not really a takedown of the idea.
The more critical challenge is that there is a massive, massive, constant factor difference between classical and quantum computing using foreseeable technology. Even in the best case (like factoring) where a quantum computer gives you a logarithmic algorithm for a classically exponential problem, it does happen to throw in slowdown factor of a trillion. Oops.
But still, even with large constant factors, algorithmic improvements eventually went out asymptotically. It’s just a matter of building a quantum computer big enough and reliable enough to handle problems of that size.
We are already hitting the limits of what we can train on GPUs with reasonable cost. I expect that there will be many advances in the years to come from improved training algorithms. But at some point that will run dry, and further advances will come from quantum computing. Scott is right to point out that this is far, far beyond any existing companies planning horizon. But that doesn’t mean quantum technologies won’t work, eventually.