junior halfroid is a cuckoo's egg

Questions and Answers : Wish list : junior halfroid is a cuckoo's egg

To post messages, you must log in.

AuthorMessage
Profile Peter Bennett

Send message
Joined: 15 Apr 20
Posts: 11
Credit: 265,146
RAC: 13
Message 95794 - Posted: 2 May 2020, 12:56:40 UTC

Junior halfroid is a cuckoo's egg

I've had a number of work units labelled 'junior halfroid ...'.

They look innocent enough to begin with, just like any other job. But when they get to the 90%+ complete, they seem to want more and more time. (Sounds like IT projects I used to work on.) Then this morning I found that a whole partition had been filled up with boinc/rosetta stuff.

I am now aborting any junior halfroid work as soon as I see it.

Has anyone else noticed this problem?
ID: 95794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 95826 - Posted: 2 May 2020, 17:47:28 UTC - in response to Message 95794.  

What did you see that was filled? Disk space? How much space was consumed by how many active tasks?

Don't put too much stock in estimated runtime to completion. The best predictor of your completion time is the runtime preference you have set in your Rosetta profile for the venue of the machine. There are several posts about how and why runtime estimates are wrong, and the issues BOINC Manager has where it requests more work than can be completed within the 3 days deadlines. See the Number Crunching board.
Rosetta Moderator: Mod.Sense
ID: 95826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Peter Bennett

Send message
Joined: 15 Apr 20
Posts: 11
Credit: 265,146
RAC: 13
Message 95895 - Posted: 3 May 2020, 3:53:24 UTC - in response to Message 95826.  

It was the partition that contained /var which filled up. I deleted the offending file without making a note of its name and reinstalled boinc, which has got things running again. At the moment one processor is idle and I have not worked out why. Perhaps it is just waiting for more work.
ID: 95895 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Peter Bennett

Send message
Joined: 15 Apr 20
Posts: 11
Credit: 265,146
RAC: 13
Message 96160 - Posted: 6 May 2020, 10:41:33 UTC - in response to Message 95895.  

I'm now attempting to run Junior Halfroid design 5 jobs, but I am getting very long run times - often over 24 hours.

I have more memory on order - upgrade my laptop from 4 to 16 GB, and an SD card for my smartphone - and this might help a bit.
ID: 96160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1681
Credit: 17,854,150
RAC: 22,647
Message 96163 - Posted: 6 May 2020, 11:11:26 UTC - in response to Message 96160.  
Last modified: 6 May 2020, 11:14:32 UTC

I'm now attempting to run Junior Halfroid design 5 jobs, but I am getting very long run times - often over 24 hours.
None of your systems show that in the work they have returned. There is one Task that you aborted after 3hrs 10min, one that ran for 19hrs 40min & then errored out, another that ran for 8hrs 40min & finished OK. And 2 systems where work has been down loaded & never returned.

The system that did return a Valid Task & the other errored out shows signs of heavy CPU use other than Rosetta; a large difference between CPU time & Runtime.


The default Target CPU Runtime is 8 hours, but Tasks can run for up to another 10 hours before the Watchdog timer will end them. Actual Runtime for a heavily used system will be much longer than the CPU time, which one of your systems is showing signs of.
Grant
Darwin NT
ID: 96163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96166 - Posted: 6 May 2020, 12:55:48 UTC - in response to Message 96160.  

Runtime is based on CPU time, not elapsed time. So, if the task is running along with other work, it may take longer to get all of the CPU time.

Yes, these recent WUs are using more memory than is common. This may be slowing their progress on your machine as well.

You can adjust the target runtime if needed, keeping in mind that the watchdog now takes 10 hours after that runtime preference to step in and end the WU.
Rosetta Moderator: Mod.Sense
ID: 96166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Wish list : junior halfroid is a cuckoo's egg



©2024 University of Washington
https://www.bakerlab.org