Message boards : Number crunching : Temperature spike in beginning of Rosetta WU
Author | Message |
---|---|
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
Normally, when my computer is crunching at full speed, either Seti or Rosetta, the cpu temp measures about 60C. About 10-15 minutes into a Rosetta wu the cpu temp jumps to 83C for 5-10 seconds. Is this normal? Better yet, can someone explain what's happening? Thanks |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
|
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
Except that it's not a drop but a spike in temperature. It happens repeatably near the beginning of Rosetta units. Does Rosetta do something in setting up that uses more of the processor? The universe is not only stranger than you imagine, it is stranger than you can imagine. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,617,765 RAC: 11,361 |
I think Polian's point is that it might be throttled and then hit full speed during the spike before throttling again. It could be a few other things like one thread working the cpu harder than two because of a memory bottleneck or something like that. Try downloading cpu-z to see what's happening, but cooling is a good starting point to fix it. Might need to redo the thermal compound if the heatsink is clean. |
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
Except that it "hits full speed" repeatably in the beginning of Rosetta units and no where else. The odds of this happening by chance are astronomical. The universe is not only stranger than you imagine, it is stranger than you can imagine. |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
I think Polian's point is that it might be throttled and then hit full speed during the spike before throttling again. It could be a few other things like one thread working the cpu harder than two because of a memory bottleneck or something like that. Try downloading cpu-z to see what's happening, but cooling is a good starting point to fix it. Might need to redo the thermal compound if the heatsink is clean. Yes, that's what I was trying to convey, thanks! CPU-Z would tell you for sure as dcdc says. Except that it "hits full speed" repeatably in the beginning of Rosetta units and no where else. The odds of this happening by chance are astronomical. The cooling is still the most plausible explanation. Check your fan/water loop and thermal interface material. Stock heat sink grease or TIM pads have a shorter lifespan than, say, Arctic Silver. Assuming that you're running at stock clock speeds, 83C is far too hot. A generic explanation (since I'm not really in the know here) would be that it could be that Rosetta is more taxing on the CPU vs SETI, I don't know. I get higher temps when doing stability testing with Prime95 (and even higher yet with IntelBurnTest) than I do when running Rosetta. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,224,342 RAC: 11,119 |
I'm experimenting with overclocking at the moment and found speedfan to be pretty vital in seeing what's happening with cpu & motherboard temperatures and fan-speed responses. This followed an incident where the power-connector to the motherboard melted into its socket(!). Thankfully I saved the motherboard and have upgraded both the fan and power-supply. |
dcdc Send message Joined: 3 Nov 05 Posts: 1831 Credit: 119,617,765 RAC: 11,361 |
Except that it "hits full speed" repeatably in the beginning of Rosetta units and no where else. The odds of this happening by chance are astronomical. Rosetta takes a little while to get going on the CPU which is why I suggested it might be that one thread is taxing the CPU more than two threads, or that one thread might run fine and when the second thread kicks in, it drags the temp up briefly before throttling kicks in. Either way, those scenarios would point to the cooling not being adequate, as Polian says. |
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
Cpu-Z tells me I am getting my full processor speed and am not throttling. I was hoping someone here might watch temps as closely as I do and might have noticed this behavior. Or better yet, there might be someone familiar with the code who could tell me what the application is doing during those spikes. It is stock cooling, and it is a few years old, so it probably doesn't have the best thermal solution. But nothing else takes it over about 60C (at the current ambient temp). I may look for another backup project rather than take my heatsink apart to remove the stock thermal grease or pad and reapply Arctic Silver. Thanks for the help. The universe is not only stranger than you imagine, it is stranger than you can imagine. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Well, in general terms, R@h is a very intense application. I've heard some say it can put more stress on the overall system then many benchmarks and stress tests. This is because it uses a lot of memory, and has intense floating point operations going on. It will make full use of L2 cache too. Many benchmarks and tests do some things and not others, but you don't get all of it happening at the same time. Having said that, I wouldn't think a 5-10 second temp. spike would be of too much concern. Some other methods of addressing would be to reduce the % of CPU used by BOINC (although that would reduce your throughputs all day long, not just during the temp. spike); or to bring in a throttle mechanism that reduces CPU speed briefly during periods of high temp.; or to have more than one backup project, to reduce the likelihood of having a large number of tasks with the same runtime properties running at the same time. Rosetta Moderator: Mod.Sense |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
Well, in general terms, R@h is a very intense application. Not really, on all my systems, both Intel and AMD, with Rosetta I always have 3-4°C less than with SETI optimized applications. or to bring in a throttle mechanism that reduces CPU speed briefly during periods of high temp. Any CPU, which doesn't fall into the category "ancient", has such mechanism. But as you say, few seconds of a bit too high is not an issue, specially since the CPU will throttle itself if it really get too hot. Another thing, that's possible (and actually very likely): simple read error. A CPU actually can't get suddenly 20-30° warmer and than suddenly cold again (unless the cooler falls off and than magically comes back on it's place). This would also explain the 5-10 seconds, the temperature is checked every few seconds. At least on my AMD system I often get garbage data from the sensors, so suddenly I have there 99° as a max. temperature (for you it seems to be 83°). . |
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
Another thing, that's possible (and actually very likely): simple read error. A CPU actually can't get suddenly 20-30° warmer and than suddenly cold again (unless the cooler falls off and than magically comes back on it's place). This would also explain the 5-10 seconds, the temperature is checked every few seconds. At least on my AMD system I often get garbage data from the sensors, so suddenly I have there 99° as a max. temperature (for you it seems to be 83°). I use speedfan to graph temperatures. It updates the temps every second. I typically see 1-2 degrees of jitter in the readings. The only time I see this 20 degree jump is near the beginning of a R@H unit and during a backup (Acronis True Image) lasting 5-10 seconds. Sometimes I see this twice, near each other, in a R@H unit. It happens 3 times during a backup. I pause Boinc during backups. In all cases the max temp is different, 83C was an average. I have one temp sensor that reads -128C. I assume this is disconnected. (Note: speedfan reports Temp1, Temp2, Temp3, HD0, and Core. Temp1 is the highest temp in the system. On the boards at speedfan's homepage I found info that the highest read temp is probably really the core. When Temp1 is 60C, Core is about 45C. I am assuming Temp1 is really the core temperature.) Thanks for the insights. The universe is not only stranger than you imagine, it is stranger than you can imagine. |
Polian Send message Joined: 21 Sep 05 Posts: 152 Credit: 10,141,266 RAC: 0 |
Sorry that I couldn't be of more specific help. I've never seen this behavior on any of my computers as long as I've been a Rosetta cruncher. Considering that no one else has reported any specific similarities to the issue you're seeing I've got nothing else better to offer than it's an aberration of some sort with your PC. |
Mark Send message Joined: 1 Dec 12 Posts: 10 Credit: 20,184 RAC: 0 |
Sorry that I couldn't be of more specific help. I've never seen this behavior on any of my computers as long as I've been a Rosetta cruncher. Considering that no one else has reported any specific similarities to the issue you're seeing I've got nothing else better to offer than it's an aberration of some sort with your PC. No worries. Thanks for trying :) The universe is not only stranger than you imagine, it is stranger than you can imagine. |
Link Send message Joined: 4 May 07 Posts: 356 Credit: 382,349 RAC: 0 |
I use speedfan to graph temperatures. It updates the temps every second. I typically see 1-2 degrees of jitter in the readings. The only time I see this 20 degree jump is near the beginning of a R@H unit and during a backup (Acronis True Image) lasting 5-10 seconds. Sometimes I see this twice, near each other, in a R@H unit. It happens 3 times during a backup. I pause Boinc during backups. In all cases the max temp is different, 83C was an average. Are you crunching on all cores when "crunching at full speed"? . |
sgaboinc Send message Joined: 2 Apr 14 Posts: 282 Credit: 208,966 RAC: 0 |
modsense: Well, in general terms, R@h is a very intense application. I've heard some say it can put more stress on the overall system then many benchmarks and stress tests. This is because it uses a lot of memory, and has intense floating point operations going on. It will make full use of L2 cache too. Many benchmarks and tests do some things and not others, but you don't get all of it happening at the same time. agree r@h is sort of comparable (or more intense) than some of those benchmark and stress apps in its heavy 'weight'-ness, and of all things it is a *real* one compared to those synthetic benchmarks lol as i'm running linux, i used 'cpupower frequency-set' to set the max frequency the cpu runs when r@h is running. i'm not too sure what is an equivalent utility in MS Windows (speedfan?). in a way i'm throttling it. i'd guess one could use a similar utility to manage that. one of the factors which i think may lead to initial higher temperatures / frequencies is the intel's 'turboboost' or such equivalent features. 'turboboost' basically is the cpu internal overclocking mechanism that sets internal limits based on TDP power. if i leave 'turboboost' on i've seen r@h push the envelope of some 75-85 deg C. I'm not sure if in the original post's case the high temperature may be caused by such similar feature followed by automatic throttling in the cpu. (i think 'turboboost' may be disabled in the bios setup screens) to avoid this situation, i set the max cpu frequency/speed such that it runs in the norm of 60-65 deg C. it still produce pretty good throughput for the r@h jobs running on all cores the other issue may be to check the heatsink and fan etc, could it be that the fan is running at low speeds or even *stopped* when r@h launches? that may point to perhaps the cpu pwm fan control app (bios?) or perhaps some of its parameter settings. |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
modsense: My gaming laptop (ASUS ROG) runs @ 95 C during summer (drops down to around 85 C during winter). It's been running like this for almost 3 years now... Tj. max is 105C, so yeah... you could let your CPU run as fast as it wants. Worst thing that could happen is throttling. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Always keep in mind that temp. of CPU is one thing, but temp of the disk drive is another. So if having such a hot CPU starts raising the temp on the disk, then that probably shortens life of the disk. But for a few seconds at a time, I don't think the disk will see any change. FWIW, I'm thinking that what you must be observing is the unzipping of some of the data that is used to process a task. I believe that's one of the first things a task does as it starts, and it would be rather intensive if the tasks gets to run flat out (i.e. no higher priority tasks preempting it). But I'd double check for dust bunnies. Esp. between the fan and the heat sink. Rosetta Moderator: Mod.Sense |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2124 Credit: 41,224,342 RAC: 11,119 |
As I continue my overclocking experiments I've been watching temperatures at start-up pretty closely - on an AMD FX8120 if that makes a difference. Just to say I'm not seeing any spike at the start and restart of tasks. I upgraded my CPU fan some months ago and while temperatures improved they weren't as much as I expected tbh. By the nature of these things I boosted my clock-speed a little more and am back where I started on temperatures up to the point I had the PSU incident I mentioned earlier. After a brief chat with the guy who does my hardware upgrades he casually mentioned that case fans (rather than CPU fans) make a difference too. I thought that'd only be marginal, but they're cheap so I thought I'd cover that angle. All temps dropped 10C! I dropped their speed to take advantage of their quiet options and was still 8C better off. I think I found out why I wasn't getting the expected benefit from the CPU fan earlier in the year. Overall this year, I've increased the multiplier from 16.5 (already OC'd from 15.5 stock) to 18.0 at much the same voltages as before and temps around 49C - anything from 12-15C lower than I started. The lesson being, there's a lot that can be done cheaply to balance up cooling rather than obsessing on symptoms Using: Arctic Cooling Freezer A30 AMD CPU Cooler - £27$42 Cooler Master: JetFlo 120 - £10$16 |
Message boards :
Number crunching :
Temperature spike in beginning of Rosetta WU
©2024 University of Washington
https://www.bakerlab.org