> The so-called TDP of the Ryzen 9950X is 170W. The used heat sinks are specified to dissipate 165W, so that seems tight.
TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.
When I see the term TDP, I remember what I have read in the "Thermal Design Document" of Intel Core2Quad Q6600 and the family it belongs:
> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).
I never used the stock cooler bundled with the processor, but what kind of dark joke is this?
Most states of “100% utilization” as you’d see in `top` are not 100% thermal output or even close. Cores waiting for memory accesses count as utilized in the former sense but will not produce as much heat as one that is actually using the ALU etc. That’s why special make-work like Prime95 is used for stress testing overclocking/thermals: it will saturate the cores with enough unblocked arithmetic work to generate more heat than having 1000 browser tabs open does.
This is more how I think too: using a cooler that supports your CPU TDP is generally fine because most people will not run a CPU 100% for an extended amount of time. But in this case they seem to be running the CPU 100% for an extended amount of time AND are using an under-spec'ed cooler (even if it is just by 5W).
You don't even need to change the actual cooler since for AMD CPUs you can pretty much customize the TDP whatever way you want, and by default they run well above their efficiency curve. For example, my 7600X has a default TDP of 105W but I run it in Eco Mode (65W) with undervolt and I barely lose any performance. Even if I did no undervolt, running the CPU in Eco Mode is generally preferable since the performance loss is still negligible (~5%).
For a general purpose system, this line of thinking makes sense. However, the desktop system in question was built to be daily driven and support some high performance code research, so it had to endure some serious loads for a desktop computer.
I went the other way and overspecced the CPU cooler and added some silent but high CFM capable fans on the system. The motherboard I got was able to adjust all fans depending on the system temps, so it scaled from a very silent desktop to a low-key space heater automatically under load.
Instead of undervolting the processor, I was using a tweaked on-demand governor on the system which stuck to lower power levels more than usual, so unless I was doing software development and testing things, it stayed cool and silent.
BTW, by 100%, I'm talking about completely saturating the CPU pipeline. Not pseudo 100% where CPU reports saturation but most of the load is iowait.
That was such a fun time to be into hardware. For years Intel had the money and relationships to keep the Pentium 4 everywhere even though AMD had the better product. The P4 might edge ahead in video rendering but the Athlon would win overall and use less power.
AND those chips overclocked to the moon. I got my E6420 to 3.2ghz (from 2.133ghz) just by upping the multiplier. A quick search makes me think my chip wasn't even that great.
Absolutely. Intel was also keeping up the tick-tock processing. I could be misremembering, but it seemed like every tock intel was getting something like 20% improvements over the last tock. It really wasn't until ~Haswell that that slowed down and continued to slow down to basically nothing. I think Kaby Lake IIRC was the last major performance jump from intel. Everything else has just been incremental changes.
One of the reasons that Intel only shipped 5% incremental updates was AMD was basically non-existent due to both Intel pressuring them and AMD has done a massive mistake with bulldozer/piledriver architecture.
They vastly underestimated how much a single FPU would be bottleneck on a multicore/SMP processor.
Then AMD took things personal and architected Zen/EPYC. The rest is history.
Certainly, and by that time Intel just sort of dropped all the balls. They were already struggling to do die shrinks and it seems like they simply lost all their ability to develop the architecture.
That had maybe happened years earlier. The thing about Conroe is, IIRC, its ancestry came from the P3 and Intel's mobile CPU designs. P4 was steady evolutions on the Netburst architecture. The years of improvements to conroe were mostly just incremental changes and porting over features from Netburst (such as hyperthreading). Once that all played out, intel really didn't have anywhere else to go or plans on how to evolve the architecture. They fell back on the same old "let's just add wider SIMD instructions (AVX)".
I also seem to recall that intel made fab bets that ultimately didn't pay off. Again, IIRC, I believe they were trying to use the same light lithography (230nm light?) rather than going into UV lithography. That caused them to dump a fair bit of money fabrication that never really paid off.
Buying parts for that particular desktop was quite fun:
- Me: Can I get a Q6600?
- Seller: But, that's... Quad core?
- Me: Yes, I'll have it.
- Seller: OK. RAM?
- Me: I'll get OCZ Flex-XLC Hybrids. 1GB.
- Seller: *Gives one*
- Me: I'll get four.
- Seller: ?
- Me: Yes, four please.
You are correct. In fact these guys measured a maximum socket power consumption of 240 watt using a 9950X at stock settings, running prime95. So far above the "170 watt" TDP:
I don’t understand this argument. If the CPU dissipated an equal number of watts of heat energy as it consumed from the wall, there wouldn’t be any energy left to do actual useful work. Isn’t the extra 100W accounted for by things like changing the state of flip-flops? In other words, mustn’t one consider the entropy reduction of the system as an energy sink?
Clocking and changing register states requires charging and discharging the gate capacitance of a bunch of MOSFET transistors. The current that results from moving all that charge around encounters resistance, which converts it to heat. Silicon is only a "semi" conductor after all.
You are correct that there is energy bound in the information stored in the chip. But last I checked, our most efficient chips (e.g., using reversible computing to avoid wasting that energy) are still orders of magnitude less efficient than those theoretical limits.
Thank you for encouraging me to go on this educational adventure. I have now heard of Landauer’s principle, which says each bit of information releases 2.9e-21 joules when destroyed: https://en.wikipedia.org/wiki/Landauer%27s_principle
I think the numbers are more like <1W used in actual information processing, >239W lost to heat. Information and the transformation of it does have some inherent energy cost. But it is very, very small. And you end up getting that back as heat somewhere else down the line anyways.
Nope. Remember that you cannot destroy energy. The energy you use to flip the flip flop still exists, only now it’s just disordered waste heat instead of electricity.
Energy cannot be created or destroyed, but it can enter and leave an open system. When I lift a 10kg box 1 meter in the air, I don’t raise its temperature at all, and I only raise mine a tiny bit, yet I have still done work on the box and therefore have imparted it energy. The energy came from food I ate earlier, and was ultimately stored in the box as gravipotential energy.
Is this not analogous to storing energy in the EM fields within the CPU?
CPUs don't store nontrivial amounts of energy, and even if storing a 1 was a significantly higher energy level than a 0 (or vice versa) there's no plausible workload that would be causing the CPU to switch significantly more 0s to 1s than 1s to 0s (or vice versa).
Yes, but only briefly. When you study the thermodynamics of information you’ll discover that it’s actually erasing information that has a cost. Every time the CPU stores a value in a register it erases the previous value, using up energy. In fact, every individual transistor has to erase the previous state on basically every clock cycle.
Curiously there is a minimum cost to erase a single bit that no system can go below. It’s extremely small, billions of times smaller than the amount of energy our CPUs use every time they erase a bit, but it exists. Look up Landauer’s Limit. There is a similar limit on the maximum amount of information stored in a system which is proportional to the surface area of the sphere that the information fits inside. Exceed that limit and you’ll form a black hole. We’re no where near that limit yet either.
>In fact, every individual transistor has to erase the previous state on basically every clock cycle.
This is incorrect in both directions.
Only transistors whose inputs are changing have to discharge their capacitance.
This means that if the inputs don't change nothing happens, but if the inputs change then the changes propagate through the circuit to the next flip flop, possibly creating a cascade of changes.
Consider this pathological scenario: The first input changes, then a delay happens, then the second input changes so that the output remains the same. This is known as a "glitch". Even though the output hasn't changed, the downstream transistors see their input switch twice. Glitches propagate through transistors and not only that, if another unfortunate timing event happens, you can end up with accumulating multiple glitches. A single transistor may switch multiple times in a clock cycle.
Switching transistors costs energy, which means you end up with "parasitic" power consumption that doesn't contribute to the calculated output.
My apologies if I wasn’t clear enough. I was only intending to make a statistical statement that the number of erasures is of similar order to the number of transistors, not that every single transistor changes its state exactly once per cycle. Some don't change their state this cycle, others end up changing multiple times before settling. In fact, some are completely powered off! (Because you’re not using the built–in GPU right now, or you’re not doing AVX512 right now, etc, etc.)
Note also that discharging the internal capacitance of a transistor, and the heat generated by current through the transistor’s internal resistance, are both costs over and above the fundamental cost of erasing a bit. Transistors can be made more efficient by reducing those additional costs, but Landauer discovered that nothing can reduce the fundamental cost of erasing a bit.
I have a 65W TDP CPU, and the difference in power draw (measured at the outlet) from idle to full CPU load is over 100W; it seems to just raise the clock until it hist 95C, so if I limit the CPU fan's top speed, the power draw goes down.
Yep. Modern CPUs continually adjust their clock multiplier based on what their temperature is doing, plus a few timers. If you have a better cooler then you’ll get more performance out of the same CPU, but at the cost of drawing more power and producing more heat.
Wow, I can't believe how BS this TDP is! I feel like a total idiot! I've always assumed it's sorta-kinda a tight upper bound on power consumption, perhaps with some allowance for "imperfections" in the dissipation properties of the CPU, and that I shouldn't sweat the details.
Couldn't this count as false/misleading advertizing though?
No, they don’t design the chip with these numbers in mind. The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
That seems a little too cynical. It matters how a customer might use a chip, such as the type of cooling that would be expected in a typical system using that model, and that's informed by the advertised specifications. Base clocks and the amount of SRAM also figure into TDP. No doubt there are completely arbitrary aspects to TDP driven purely by profit-focused market segmentation, but it's not just that.
That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.
No, clock speed and cache have nothing to do with TDP. AMD uses a simple formula to calculate TDP. It is the temperature of the IHS minus the air temperature measured at the cpu cooler’s intake fan, divided by a conversion faction in °C/W.
But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.
I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.
> The formula doesn’t even mean anything, in real physical terms.
From your description the formula is how you would calculate the power for which a certain heatsink at a given ambient temperature would result in the specified IHS temperature.
The °C/W number is not a conversion factor but the thermal resistance[1] of the heatsink & paste, that is a physical property.
So unless I misunderstood you it's very much something real in physical terms.
It might be a useful formula _if_ the numbers were real. Note that when AMD tells you that a 9900X cpu is has a 120W TDP, that's because they picked three numbers to plug into that formula that result in 120 popping out. They picked the result of 120 first, and then found numbers to put into the formula so that it gives you that result.
But the reason I say that it’s physically meaningless is that real heat dissipation is strongly temperature dependent. The thermal conductivity of a heatsink goes up as the temperature goes up because heat is more effectively transferred into the air at higher temperatures.
>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?
Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).
No, the TDP number doesn’t even vaguely approximate anything. You can’t use the number to predict anything, or to plan, or to estimate your electric bill, or anything like that.
Isn't TDP supposed to be an upper bound of how much power budget there is for a chip when it's running under maximum IPC (which implies AVX512 workload spread across all cores with all the test data in all the L1 caches)? I guess that power budget can vary due to process imperfections and/or CPU bugs but saying that it doesn't approximate anything is hard to believe. How about the PSU then, e.g. is 800W PSU a made-up number as well?
No, TDP is only supposed to be a marketing number. It would be nice if it were a real number that meant something, but CPU manufacturers don’t want to include really complicated information in their marketing. When they want to emphasize that a processor is powerful, they increase the TDP number! When they want you to buy an efficient laptop, they just lower the number! Same cpu, same number of transistors, same number of cores and PCIe lanes, different model number, different TDP number.
The power ratings of power supplies, on the other hand, are perfectly valid. Try to draw more than that and they will blow a fuse. Note however that a power supply’s efficiency is nonlinear. If your computer is really drawing 800W from the power supply, then the power supply is probably drawing 1000W from the wall, or maybe more. The difference is converted into heat during the conversion from 120V AC to 12V DC (and 5V DC and 3.3V DC, etc, etc). That’s an efficiency of 80%. But if your PC was drawing 400W from the same power supply then maybe the efficiency would be 92% instead, and the supply would only draw 435W from the wall. The right power supply for your computer is the cheapest one that is most efficient at the level of power that your computer actually needs. The Bronze/Gold/Platinum efficiency ratings are almost BS made–up marketing things though, because all that tells you is that it hits a certain efficiency rating at _some_ power level, not that it does so at the power level you’ll typically run your computer at.
There is a similar but more extreme set of nonlinearities when talking about the power drawn by a CPU (or a GPU). The CPU monitors its own temperature and then raises or lowers its own frequency multiplier in response to those temperature changes. This means that the same CPU will draw more power and run faster when you cool it better, and will run more slowly and generate less heat when the ambient temperature is too high. There are also timers involved. Because so many of the tasks we actually give to our CPUs are bursty, CPU performance is also bursty. The CPU will run at a high speed for a short period of time, then automatically scale back after a few seconds. The exact length of that timer can be adjusted by the BIOS, so laptop motherboards turn the timer down really short (because cooling in a laptop is terrible), while gamer motherboards turn them way up (because gamers buy overbuilt Noctua coolers, or water cooling, or whatever). Intel and AMD cannot even tell you a single number that encompasses all of these factors. Thus TDP became entirely meaningless and subject to the whims of marketing.
https://www.usenix.org/system/files/conference/cooldc16/cool... uses 84W TDP-rated Haswell-architecture Intel i7-4770, and authors constructed the synthetic microbenchmark with 1.67 IPC FPU and 3.86 IPC integer workloads. Then they use RAPL (Running Average Power Limit), something I learned that exists as of today, to measure the power usage on the level of the whole chip package. Reported numbers are ~22W.
Considering that the microbenchmark is utilizing only one core, and considering that this chip has 4 cores in total, could it really be that they would measure ~84-88W if they had designed the microbenchmark so that it utilizes all of the cores? This would then match the declared TDP.
They didn’t measure 22W, they measured 6W + 22.1W + 4.9W + 1.8W + 4.8W + 11.2W = 50.8W. Add in 66.3W for the other three cores and that would be 117.1W. Benchmark #2 measured a few watts less than that.
But they don’t give the IHS temperature so you could repeat the exact same experiment using the same hardware and get different numbers simply because your cooling setup was better or worse than theirs.
My understanding, and per Intel documentation, is that RAPL is giving them power consumption over the whole package therefore I believe 22W for Cores (W) in their figure is correct? Other figures such as instruction decoder they seem to extrapolate from that figure since RAPL doesn't and can't give information on that level of granulation? I could be wrong but that's how I interpret their data and why I think the date is not to be accumulated together.
As per cooling setup, I think I agree. This is something that I didn't know but it makes sense.
Right, RAPL just reports a total power usage figure for the whole CPU. The authors then develop a model which they believe splits that total into multiple components that correspond to parts of the CPU. This is possible because CPUs provide performance counters that measure what the CPU is actually doing. For example if you write programs that are very similar but have different ratios of cache hits and misses then they’ll draw different amounts of power. You can use those differences to devise a formula for the amount of power used by the cache.
And indeed, they give their formula in section 4.2:
You can see that the power used by the whole package is the sum of six terms. The values that the calculated for those six terms for each of their benchmarks are given in table 4. The 22W figure for the core(s) is just based on the frequency the CPU is running at.
Its pretty insane to see someone say something like: “TDP is about thermal watts, not electrical watts. These are not the same.” Watts are watts.
But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.
But in the end that's still not actually true in many modern desktop chips. You can take a 65W part, and with a "stock" motherboard firmware, good cooling, and the right workload end up averaging way more than 65W. Or if you have it in a hot room it just might end up using less than 65W.
TDP is more of a rough idea of how much power the manufacturer wanted to classify the part as. It ultimately only loosely relates to the actual heat or electrical usage in practice.
> Couldn't this count as false/misleading advertizing though?
For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?
> The chip is not designed for this rate of power dissipation
Says who? AMD advertises the chip as having a base clock of 4.3 GHz over all cores. The 9950X pulls somewhere around 220W at 5ghz all cores and with how power scales, 170W at the advertised 4.3 GHz seems more than plausible. Seems perfectly within reason that the advertised frequency and the advertised TDP are aligned.
I wish Anandtech was still around as iirc they did have charts for all this, which nobody else seems to do :/
> and it is not the rate of power dissipation that you can expect to get from the chip.
Again, says who? Who's expectations? This is a consumer chip, and the expectations of a consumer chip is not that it spends 100% of its time running prime95 or a similar "power virus" workload. I expect that if I buy this chip while I would have intervals of >170W, I'd also have long periods of much less than 170W. If I have a cooler designed to sustain 170W of cooling, that's going to work out on average just fine as there's thermal mass in the system.
TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.
https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...