DC id's hidden hardware problems in many PCs ?

Message boards : Number crunching : DC id's hidden hardware problems in many PCs ?

To post messages, you must log in.

AuthorMessage
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 32209 - Posted: 7 Dec 2006, 12:12:05 UTC

As usual, wandering 'round the net for info on Distributed Computing. Came across:

"Historically, searching for Mersenne primes has been used as a test for computer hardware. The free GIMPS program used by CMSU has identified hidden hardware problems in many PCs."

Interesting. Can anyone point me to more info on this? Is this only in regards to oc'ing, or actual hardware design problems?

Does Rosetta "stress" computer hardware in such a way as to be useful for detecting hardware problems?
ID: 32209 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 11,586
Message 32210 - Posted: 7 Dec 2006, 12:28:07 UTC

As DC apps tend to run the CPU at near-max temperatures its usually heat issues that are uncovered that might normally not be noticed (until you're doing something really important in Excel and the CPU heats up, crashes the comp and you lose your work!)

Rosetta keeps the CPU nice and warm, although I believe it's mainly the FPU that's being maxed out so there is room for more heat to be generated.

Prime95 is the standard app for stress-testing the CPU, as it has a specific option to stress-test where it runs calculations and has the answer to check them against. If the computer starts making mistakes, it fails the test.

Basically, any computer should pass a CPU stress test, and if it can't then its cooling needs looking at or it needs it's clock rate or voltage reducing.

HTH
Danny

ID: 32210 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 32212 - Posted: 7 Dec 2006, 13:33:49 UTC

Whilst Rosetta may be good at "pushing the processor to it's limit", I'd also say that it's not a good "test" for the processor, as it's not actually checking the result against a known good value.

A good TEST should, as dcdc says, test for a known value and check that it's correct, after using as much as possible of the entire processor. I've never used Prime95, but I guess it's a good one.

Obviously, any product that is available on the market that isn't marked "Engineering sample" or some such is supposed to have been tested (verified) to make sure that it's always doing the right thing. Part of that testing would be to run tests with known results and comparing that with the actual result of the test - there are plenty of math-intensive calculations that can be performed this way where the result is always the same (which Rosetta work-units are not). Some of these tests (selected to cover as much of the processor in as short a time as possible) are used for "production" testing, which is the tests that ALL processors go through as part of the production. Other, more length/complex tests are used for "design testing", which is only performed on "new" designs before they are released to the general public. Some tests are generally available software, and other tests are specially created by the people who design that part of the processor (say code to excercise certain parts of the math-unit of the processor). The generic applications are used because they contain "real world" code and generally cover a wider range of the processors functionality, the specific tests are written to cover specific areas either because it's found to be lacking in the generic applications (say, most applications don't use fsin with values greater than 2pi, whilst the designer would like to test these type of conditions too. Particularly "error-paths" are important sections for the dedicated tests, as those are most often not used in the "real" code - it's rare that someone really does divide by zero in their math-code unless by mistake, right?)

You can use DC projects such as SETI or Einstein to verify the design too, as those projects compare the actual result between multiple machines, so if your machine is doing something wrong, it will show up as an "error". Since Rosetta doesn't, it's not useful in this sense.

--
Mats


ID: 32212 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,627,225
RAC: 11,586
Message 32214 - Posted: 7 Dec 2006, 14:02:40 UTC - in response to Message 32212.  

Whilst Rosetta may be good at "pushing the processor to it's limit", I'd also say that it's not a good "test" for the processor, as it's not actually checking the result against a known good value.


Oh yeah - would definitely second that! Rosetta and the other production release's results aren't somewhere to be testing, and waiting for a crash to determine whether a computer is running properly is like checking your car engine starts and assuming it's roadworthy...

ID: 32214 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : DC id's hidden hardware problems in many PCs ?



©2024 University of Washington
https://www.bakerlab.org