Rosetta 4.1+ and 4.2+

Message boards : Number crunching : Rosetta 4.1+ and 4.2+

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 34 · Next

AuthorMessage
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 390
Credit: 12,073,013
RAC: 4,827
Message 93659 - Posted: 6 Apr 2020, 18:49:11 UTC - in response to Message 93655.  

Thanks

Issue appears to be with Rosetta v4.12 i686-pc-linux-gnu
No issues with Rosetta v4.12 x86_64-pc-linux-gnu

After project reset getting the proper 64bit app tasks, where before would almost exclusively get tasks for the potential faulty 32bit app.


Thanks for this information. Will test it.

Looked at the other machines, all 64-bit machines got the 32 Bit app.

Why is Rosetta delivering 32-Bit apps to 64-bit machines?


I've just tried it and its downloaded six 4.12-i686 WUs estimated at 11 hours each :-(
ID: 93659 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bas

Send message
Joined: 19 Mar 20
Posts: 2
Credit: 323,889
RAC: 0
Message 93680 - Posted: 6 Apr 2020, 21:17:28 UTC - in response to Message 93629.  


Issue appears to be with Rosetta v4.12 i686-pc-linux-gnu
No issues with Rosetta v4.12 x86_64-pc-linux-gnu

After project reset getting the proper 64bit app tasks, where before would almost exclusively get tasks for the potential faulty 32bit app.


I got both 64 and 32bit WU's after a reset. I have removed 32bit support from my Arch linux machines, looks like the boinc-clients have noticed this:

[Rosetta@home] App version has unsupported platform i686-pc-linux-gnu; changing to x86_64-pc-linux-gnu

Unfortunately, communication is deferred by an hour, so I don't know if this will help. I'll let you know in an hour or so. ;) For people who need 32bit support, this is obviously not a solution.
ID: 93680 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bas

Send message
Joined: 19 Mar 20
Posts: 2
Credit: 323,889
RAC: 0
Message 93685 - Posted: 6 Apr 2020, 21:49:46 UTC - in response to Message 93680.  

Unfortunately, communication is deferred by an hour, so I don't know if this will help. I'll let you know in an hour or so. ;) For people who need 32bit support, this is obviously not a solution.


Looks like it did the trick, all downloaded WU's are now x86_64.
ID: 93685 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 93696 - Posted: 6 Apr 2020, 23:21:39 UTC
Last modified: 6 Apr 2020, 23:24:04 UTC

I posted over on Ralph but figured I should ask here as well. New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long".

Not all 4.15 WU fail, but about 20% are.

Are these WU errors or the result of a new bug in 4.15?

Example: https://boinc.bakerlab.org/rosetta/result.php?resultid=1141395048

If you click on tasks under my profile, all my failed 4.15 units across all machines say the same thing, so I know it's not just one machine being sketchy. If it's just a WU thing, well ok. But still it's a bummer to crunch for 8 hours only to have it fail.
ID: 93696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MarkJ

Send message
Joined: 28 Mar 20
Posts: 72
Credit: 25,238,680
RAC: 0
Message 93697 - Posted: 6 Apr 2020, 23:54:30 UTC - in response to Message 93696.  
Last modified: 6 Apr 2020, 23:55:24 UTC

New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long".

Thats a BOINC "feature", supposedly fixed in 7.16.

The machine hasn't been able to move the files out of the slot directory fast enough. That means the disk is overloaded. Freeing up a thread for BOINC might help or if possible use a faster storage medium (SSD or faster disk). Not sure if you want to try a beta-test version of BOINC (ie the 7.16's) on that machine or even if there is one. The BOINC developers are concentrating on x64 machines only in-line with Apple's dropping support for 32bit apps in current OSX versions.
BOINC blog
ID: 93697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,760,387
RAC: 22,853
Message 93698 - Posted: 7 Apr 2020, 0:02:33 UTC - in response to Message 93696.  

I posted over on Ralph but figured I should ask here as well. New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long".
The "finish file present too long" issue has been around for years. But why it would start pupping up now- en masse -is a bit of a mystery.

From the BOINC forums- "In looking through the client code it looks like this condition occurs when the client finds that the boinc finish file has been written to disk but the science application process is still running." When a Task completes the BOINC Manager expects the Application to finish it's house keeping & exit within a certain time frame. If it doesn't the Manager just clobbers it, and you get the "finish file present too long" error.

Generally it happens during periods of heavy disk I/O & heavy CPU usage. The more cores you have, and the more things (inc other than just Rosetta) the system is doing at the time, the more likely the error. So the more threads, larger result files, many applications finishing (or at least checkpointing at the same time- eg such as when exiting BOINC & re-booting a system), the more likely the problem is.


A fix had been proposed (AFAIK) years ago, I did ask in another project when the fix was rolled out & the answer i got was "There is currently a v7.16.5 Beta available - that should fix it."
Grant
Darwin NT
ID: 93698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 93705 - Posted: 7 Apr 2020, 3:48:27 UTC - in response to Message 93698.  

I posted over on Ralph but figured I should ask here as well. New 4.15 seems to be working better on older MacOS machines, but now after crunching for 8 hours on some I'm getting processing errors at the end with "finish file present too long".
The "finish file present too long" issue has been around for years. But why it would start pupping up now- en masse -is a bit of a mystery.

From the BOINC forums- "In looking through the client code it looks like this condition occurs when the client finds that the boinc finish file has been written to disk but the science application process is still running." When a Task completes the BOINC Manager expects the Application to finish it's house keeping & exit within a certain time frame. If it doesn't the Manager just clobbers it, and you get the "finish file present too long" error.

Generally it happens during periods of heavy disk I/O & heavy CPU usage. The more cores you have, and the more things (inc other than just Rosetta) the system is doing at the time, the more likely the error. So the more threads, larger result files, many applications finishing (or at least checkpointing at the same time- eg such as when exiting BOINC & re-booting a system), the more likely the problem is.


A fix had been proposed (AFAIK) years ago, I did ask in another project when the fix was rolled out & the answer i got was "There is currently a v7.16.5 Beta available - that should fix it."


I've only done some minor "testing" and it seems like it happens more often when a lot of tasks end almost simultaneously. If I can pause some threads and have them finish more spaced out it seems to not be an issue. When the last WU drop happened I had 24 tasks start at the same time, and mostly finish at the same time. That's when I saw it happen more often. Once they space out a bit they (at least from my observations) don't seem to fail.
ID: 93705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,760,387
RAC: 22,853
Message 93706 - Posted: 7 Apr 2020, 4:01:54 UTC - in response to Message 93705.  

I've only done some minor "testing" and it seems like it happens more often when a lot of tasks end almost simultaneously. If I can pause some threads and have them finish more spaced out it seems to not be an issue. When the last WU drop happened I had 24 tasks start at the same time, and mostly finish at the same time. That's when I saw it happen more often. Once they space out a bit they (at least from my observations) don't seem to fail.
Yep, makes sense. Reduce the disk I/O & CPU requirements by not having everything happening all at the same time.

But as CPUs get more cores & threads, and GPUs (for the projects that use them) get more & more powerful, it's just going to be a bigger & bigger issue.
So hopefully the next BOINC manager version, when it's finally released, will fix the problem once at for all.
*fingers crossed*
Grant
Darwin NT
ID: 93706 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
CIA

Send message
Joined: 3 May 07
Posts: 100
Credit: 21,059,812
RAC: 0
Message 93707 - Posted: 7 Apr 2020, 4:03:56 UTC - in response to Message 93705.  

A fix had been proposed (AFAIK) years ago, I did ask in another project when the fix was rolled out & the answer i got was "There is currently a v7.16.5 Beta available - that should fix it."


17.16.6 is the most recent beta for OSX Boinc (dated April 3rd, 2020). If I run that (and it seems to work) will my work still be counted as valid for Rosetta? Or will it just be similar to Ralph where it's more proof of concept work?
ID: 93707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,760,387
RAC: 22,853
Message 93710 - Posted: 7 Apr 2020, 4:55:33 UTC - in response to Message 93707.  
Last modified: 7 Apr 2020, 5:03:38 UTC

17.16.6 is the most recent beta for OSX Boinc (dated April 3rd, 2020). If I run that (and it seems to work) will my work still be counted as valid for Rosetta? Or will it just be similar to Ralph where it's more proof of concept work?
You process work for a Project, you get credit from that project- eg. you process work for Ralph, you get Credit from Ralph. You process work for Rosetta, you get Credit Rosetta. You process work for Einstein, you get Credit from Einstein, since they are the projects that you are doing work for.

All the manager does is schedule when work is run between different projects & do the job of requesting, & reporting completed work. It's not responsible for any of the processing; that's what the applications it downloads to process the work do. BOINC is what the projects use to run & manage their projects, the BOINC Manager is the client for computers that lets people do work for those projects.
So it doesn't matter what BOINC Manager you use to join & process work for a project (unless there is some particular issue that affects a project you wish to join).


I'd just check the release notes to see what what (if any) the known issues are, and decide if you're prepared to put up with them. If things get too annoying, just re-install your current Manager.
Grant
Darwin NT
ID: 93710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 93711 - Posted: 7 Apr 2020, 5:26:29 UTC - in response to Message 93705.  

In general, I believe the maximum memory consumption of a task is as it approaches the end of a model. In your case, you are describing the end of the WU... which would always coincide with the end of a model as well. So, it is possible there was some swap contention, checkpointing, WU completion disk activity all going on at the same time.
Rosetta Moderator: Mod.Sense
ID: 93711 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Myers
Avatar

Send message
Joined: 29 Mar 20
Posts: 97
Credit: 332,619
RAC: 425
Message 93717 - Posted: 7 Apr 2020, 8:18:41 UTC

The timeout for the finish file check was lengthened from 10 seconds to 300 seconds in #3019 and incorporated in the 7.16 client branches.

There is no reason to be scared of a "test" BOINC version. Perfectly usable.
ID: 93717 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Blackbird

Send message
Joined: 16 Jan 07
Posts: 5
Credit: 733,433
RAC: 872
Message 93730 - Posted: 7 Apr 2020, 12:10:45 UTC

Whatever issues I had appear to be resolved with rosetta 4.15. I have successfully run and validated a number of tasks now.
ID: 93730 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Plomos

Send message
Joined: 4 Mar 11
Posts: 11
Credit: 439,043
RAC: 0
Message 93774 - Posted: 7 Apr 2020, 20:36:07 UTC

I have had several 4.12 WUs that have completed just fine and earned plenty of credits such as https://boinc.bakerlab.org/rosetta/result.php?resultid=1142060959 but two of them that just finished https://boinc.bakerlab.org/rosetta/result.php?resultid=1142041210 and https://boinc.bakerlab.org/rosetta/result.php?resultid=1142041257 only ran 1 decoy in 8 hours which seems to be a problem with the 32bit version of the app. I am running a 64bit OS so I would hope that all the tasks that are pulled to my machine are 64bit but this does not seem to be the case
ID: 93774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
csbyseti

Send message
Joined: 24 Dec 05
Posts: 11
Credit: 4,989,744
RAC: 12,733
Message 93849 - Posted: 8 Apr 2020, 10:53:11 UTC - in response to Message 93774.  

The 64-Bit app for Linux works fine with my machine, i think the 32 bit app for Linux has a bug.

Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's. Machines run 32 bit must be really old with low speed cpu's.
It makes no sense running this machines for Boinc.
ID: 93849 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Andreas Kübrich

Send message
Joined: 24 Mar 20
Posts: 1
Credit: 243,141
RAC: 0
Message 93851 - Posted: 8 Apr 2020, 11:12:40 UTC - in response to Message 93849.  

Machines run 32 bit must be really old with low speed cpu's.

Or they’re reasonably recent systems running a 32-bit operating system.
ID: 93851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,760,387
RAC: 22,853
Message 93853 - Posted: 8 Apr 2020, 11:21:01 UTC - in response to Message 93849.  

Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.
Why? They're no slower than the equivalent 64bit application.
Grant
Darwin NT
ID: 93853 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
csbyseti

Send message
Joined: 24 Dec 05
Posts: 11
Credit: 4,989,744
RAC: 12,733
Message 93881 - Posted: 8 Apr 2020, 16:42:05 UTC - in response to Message 93851.  
Last modified: 8 Apr 2020, 16:43:12 UTC

Machines run 32 bit must be really old with low speed cpu's.

Or they’re reasonably recent systems running a 32-bit operating system.


Makes no sense because of memory limitation. There is no reason using a 32 bit OS on a modern CPU.
Or you want to use old software which don't work on modern OS. But why must this machine run Boinc?

Perhaps they can count the number of 32-bit Systems, adding the generated TFlops and then decide to stop 32-bit app or not.

Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.
Why? They're no slower than the equivalent 64bit application.


The Projekt developers have to support two more App-Versions not really needed anymore.
ID: 93881 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1677
Credit: 17,760,387
RAC: 22,853
Message 93928 - Posted: 8 Apr 2020, 22:41:24 UTC - in response to Message 93881.  
Last modified: 8 Apr 2020, 22:41:55 UTC

Perhaps the Rosetta Admins should think about removing 32-bit apps for x86 cpu's.
Why? They're no slower than the equivalent 64bit application.

The Projekt developers have to support two more App-Versions not really needed anymore.
My point exactly. There is no need for the 64bit applications, so why produce them?
Grant
Darwin NT
ID: 93928 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
James W

Send message
Joined: 25 Nov 12
Posts: 130
Credit: 1,766,254
RAC: 0
Message 94045 - Posted: 10 Apr 2020, 9:00:46 UTC

Name: rb_04_06_17111_20290_ab_t000__h002_robetta_IGNORE_THE_REST_09_18_905299_13_0
Application: Rosetta v4.12 windows_intelx86
Device: 1759960, Task: 1141280631, and WU: 1027346991.
Status: Validate error.
Exit status: 0 (0x00000000)
Errors: Too many total results.
canonical result: 1141280631
granted credit: 257.16

======================================================
DONE :: 1 starting structures 28653 cpu seconds
This process generated 107 decoys from 107 attempts
======================================================

Just appears strange that the app ran to completion and credit was even figured. Also this was considered a "canonical result." However, apparently something made this an "invalid" result, which is not obviously explained in the Stderr output.
ID: 94045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 34 · Next

Message boards : Number crunching : Rosetta 4.1+ and 4.2+



©2024 University of Washington
https://www.bakerlab.org