Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 236 · 237 · 238 · 239 · 240 · 241 · 242 . . . 302 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107150 - Posted: 8 Oct 2022, 13:45:01 UTC - in response to Message 107146.  

Meantime, the validator and the assimilator seem to be freezed
Does it matter? The tasks are coming through and we can return them. The little job of validating will take place very quickly once someone comes into the office on Monday and kicks the server. Are you that desperate for your credits?
ID: 107150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 107152 - Posted: 8 Oct 2022, 15:10:20 UTC - in response to Message 107150.  

Are you that desperate for your credits?

If you know me, the credits are the last of my thoughts.
But if the validator has problems, the work could be lost: so no science and wasted work.
ID: 107152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tomcat雄猫

Send message
Joined: 20 Dec 14
Posts: 180
Credit: 5,386,173
RAC: 0
Message 107153 - Posted: 8 Oct 2022, 15:12:46 UTC

Bad batch of RB tasks, perhaps?
rb_10_07_420543_416291_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_03_06_2919833_26_0
<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
Incorrect function.
 (0x1) - exit code 1 (0x1)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe @rb_10_07_420543_416291_ab_t000__robetta_FLAGS -in::file::fasta t000_.fasta -jumps:pairing_file t000_.fasta.bbcontacts.jumps -jumps:random_sheets 1 -constraints::cst_file t000_.fasta.CB.cst -constraints:cst_weight 5.0 -constraints::cst_fa_file t000_.fasta.MIN.cst -constraints:cst_fa_weight 5.0 -in:file:boinc_wu_zip rb_10_07_420543_416291_ab_t000__robetta.zip -frag3 rb_10_07_420543_416291_ab_t000__robetta.200.3mers.index.gz -fragA rb_10_07_420543_416291_ab_t000__robetta.200.6mers.index.gz -fragB rb_10_07_420543_416291_ab_t000__robetta.200.3mers.index.gz -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -mute all -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2145106
Using database: database_357d5d93529_n_methylminirosetta_database

[ ERROR ]: Caught exception:


File: C:cygwin64homeboinc4.17Rosettamainsourcesrccore/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: -nan(ind)
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



</stderr_txt>
]]>

rb_09_03_406077_401241__t000__0_C1_SAVE_ALL_OUT_IGNORE_THE_REST_2917054_4409_1
<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 3221225477 (0xc0000005)</message>
<stderr_txt>
command: projects/boinc.bakerlab.org_rosetta/rosetta_4.20_windows_x86_64.exe -run:protocol jd2_scripting @flags_rb_09_03_406077_401241__t000__0_C1_robetta -silent_gz -mute all -out:file:silent default.out -in:file:boinc_wu_zip input_rb_09_03_406077_401241__t000__0_C1_robetta.zip -max_registry_shift 3 -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 1826313
Using database: database_357d5d93529_n_methylminirosetta_database


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x0000000000000000 

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 10/08/22 10:00:20
Install Directory : C:Program FilesBOINC
Data Directory    : B:BOINC
Project Symstore  : https://boinc.bakerlab.org/rosetta/symstore
LoadLibraryA( B:BOINCdbghelp.dll ): GetLastError = 126
Loaded Library    : dbghelp.dll
LoadLibraryA( B:BOINCsymsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( B:BOINCsrcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( B:BOINCversion.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: B:BOINCslots5;B:BOINCprojectsboinc.bakerlab.org_rosetta;srv*B:BOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*B:BOINCprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore
ID: 107153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107154 - Posted: 8 Oct 2022, 15:19:47 UTC - in response to Message 107152.  

Are you that desperate for your credits?

If you know me, the credits are the last of my thoughts.
But if the validator has problems, the work could be lost: so no science and wasted work.
Work doesn't tend to get lost unless there's some catastrophic disk error. If you look at the specs of Rosetta's Servers, I very much doubt that would happen. Look, it's just crashed, someone restarts the validator program on Monday and it goes through them in 20 minutes.
ID: 107154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 107155 - Posted: 8 Oct 2022, 15:23:48 UTC - in response to Message 107154.  

Work doesn't tend to get lost unless there's some catastrophic disk error. If you look at the specs of Rosetta's Servers, I very much doubt that would happen.


Don't speak about R@H server.
They are VERY old, not updated, etc
ID: 107155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107156 - Posted: 8 Oct 2022, 15:28:08 UTC - in response to Message 107155.  

Work doesn't tend to get lost unless there's some catastrophic disk error. If you look at the specs of Rosetta's Servers, I very much doubt that would happen.


Don't speak about R@H server.
They are VERY old, not updated, etc
WTF? From their server status page, 72TB of SSD, show me another project with that, and I've never seen Rosetta slow or overloaded, unlike almost every other project out there that can't keep up:


Web servers:
Rack mounted 1U SuperMicro server
Specs: Intel Xeon CPU E3-1270 v5 @ 3.60GHz, 32 GB RAM, X11SSH-TF, 10 GbE
Storage: 256GB SSD for OS
File system: ZFS on Linux v0.7
OS: Ubuntu Server 16.04
Primary file server:
Rack mounted 4U SuperMicro server
Specs: Dual Intel Xeon E5-2640 v4 @ 2.40GHz, 256 GB RAM, X10DRD-IT, 2 x 10 GbE
Storage: 72 x 1TB SSD via LSI SAS 9207-8i
File system: ZFS on Linux v0.7 (raidz2, 9 vdevs with 8 disks) served via NFSv4
OS: Ubuntu Server 16.04
Backup file server:
Rack mounted 4U SuperMicro server
Specs: Intel Xeon E5-1650 v4 @ 3.60GHz, 128 GB RAM, X10SRM-TF, 2 x 10 GbE
Storage: 24 x 10TB HGST drives via LSI SAS 9300-8i
File system: ZFS on Linux v0.7 (raidz2, 2 vdevs with 12 disks)
OS: Ubuntu Server 16.04
Database servers:
Rack mounted 1U SuperMicro server
Specs: Intel Xeon CPU E3-1270 v6 @ 3.80GHz, 64 GB RAM, X11SSH-TF, 10 GbE
Storage: 2 x 1TB SSD
File system: ZFS on Linux v0.7 (mirror)
OS: Ubuntu Server 16.04
ID: 107156 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 259
Credit: 497,274
RAC: 997
Message 107157 - Posted: 8 Oct 2022, 15:30:43 UTC - in response to Message 107156.  
Last modified: 8 Oct 2022, 15:30:57 UTC

Ubuntu 16.04 will recieve Extended Security Maintenance up to 2026
https://ubuntu.com/about/release-cycle
ID: 107157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 107158 - Posted: 8 Oct 2022, 15:32:28 UTC - in response to Message 107157.  

Ubuntu 16.04 will recieve Extended Security Maintenance up to 2026
https://ubuntu.com/about/release-cycle


For a fee.
ID: 107158 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,623,704
RAC: 8,387
Message 107159 - Posted: 8 Oct 2022, 15:35:32 UTC - in response to Message 107156.  

WTF? From their server status page, 72TB of SSD, show me another project with that, and I've never seen Rosetta slow or overloaded, unlike almost every other project out there that can't keep up:

Web servers:
Rack mounted 1U SuperMicro server
.................


All this HW and SW are, almost, 10 year old.
Ehy, It's not a problem, i have customers with over 12 years old server. But you can't say it's all up to date.
ID: 107159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 259
Credit: 497,274
RAC: 997
Message 107160 - Posted: 8 Oct 2022, 15:37:17 UTC

For me 10 year old is from 2010.
ID: 107160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107161 - Posted: 8 Oct 2022, 15:45:21 UTC - in response to Message 107158.  
Last modified: 8 Oct 2022, 15:45:39 UTC

Ubuntu 16.04 will recieve Extended Security Maintenance up to 2026
https://ubuntu.com/about/release-cycle


For a fee.
I thought the point of Linux was it was free? Otherwise everyone would be on Windows.
ID: 107161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107162 - Posted: 8 Oct 2022, 15:46:20 UTC - in response to Message 107160.  

For me 10 year old is from 2010.
You are speaking to us from the past? Cool.
ID: 107162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107163 - Posted: 8 Oct 2022, 15:47:13 UTC - in response to Message 107159.  

WTF? From their server status page, 72TB of SSD, show me another project with that, and I've never seen Rosetta slow or overloaded, unlike almost every other project out there that can't keep up:

Web servers:
Rack mounted 1U SuperMicro server
.................


All this HW and SW are, almost, 10 year old.
Ehy, It's not a problem, i have customers with over 12 years old server. But you can't say it's all up to date.
I doubt the 72TB of SSD is 10 years old. My equipment is made of old and new stuff, you upgrade the bits that are the bottleneck.
ID: 107163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 259
Credit: 497,274
RAC: 997
Message 107164 - Posted: 8 Oct 2022, 15:53:38 UTC

Look at RHEL. You pay for using packages built on their hardware and professional techsupport.
ID: 107164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107165 - Posted: 8 Oct 2022, 15:55:44 UTC - in response to Message 107164.  

Look at RHEL. You pay for using packages built on their hardware and professional techsupport.
Ouch. I've never paid for Linux.
ID: 107165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 259
Credit: 497,274
RAC: 997
Message 107166 - Posted: 8 Oct 2022, 15:57:50 UTC

There are rebuilds from the source code. For example Rocky linux.
ID: 107166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107167 - Posted: 8 Oct 2022, 16:02:33 UTC - in response to Message 107166.  
Last modified: 8 Oct 2022, 16:03:06 UTC

There are rebuilds from the source code. For example Rocky linux.
If you have to pay for it you might aswell get Windows. Assuming you don't use Piratebay ROFL! Hands up who thinks I paid MS for 8 Windows 11 licenses.
ID: 107167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 259
Credit: 497,274
RAC: 997
Message 107168 - Posted: 8 Oct 2022, 16:03:34 UTC

Kms?
ID: 107168 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 11,839,945
RAC: 11,375
Message 107169 - Posted: 8 Oct 2022, 16:11:58 UTC - in response to Message 107168.  
Last modified: 8 Oct 2022, 16:13:22 UTC

Kms?
Correct, tick, one mark.

I know someone who would have paid the full $200 each, that would be $1600 for some Boinc machines. I don't think so.
ID: 107169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 107170 - Posted: 8 Oct 2022, 21:54:52 UTC - in response to Message 107145.  

Dropped WCG. Seems the GPU server can't do it's job handing out tasks properly.
Everything stalls with transient errors.
Oh well...something else then.

Not sure why you need to drop it. You're not getting runnable tasks anyway...
But I sympathise. I'm getting the same thing and it's driving me nuts.

On the plus side, while Rosetta has some tasks, hits on WCG have reduced and it's not <quite> as bad as it has been.
Still terrible though, I accept



I don't get their rah rah text all over the place but yet they can't send any work?
Maybe CPU fairs better, but GPU is nuts.
ID: 107170 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 236 · 237 · 238 · 239 · 240 · 241 · 242 . . . 302 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2024 University of Washington
https://www.bakerlab.org