Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over
Author | Message |
---|---|
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
Do you have another account here? Because there is no sign of any computers on the account you used to post here, let alone them having got any work, or produced any errors. When you first join a project (any project) there will be a lot of downloading as not only do you have to get Tasks to process, you also need to get the applications to process them. Different types of Tasks will also require different support files. However once all of these files have been downloaded, then the actual data files downloaded to process are generally only a few hundred kB- although the result files being sent back can be as much as 30MB (some times more), usually a lot less. If file transfers tend to be sticky (it says it's uploading/downloading but nothing is actually happening), in the BOINC Manager (Advanced view), Activity, select "Suspend network activity", then re-select "Network activity based on preferences". It may also be necessary to use a proxy server to work around the problems with your net connection. Grant Darwin NT |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The BOINC Manager will take care of retrying downloads that get interrupted. BOINC also has settings where you can limit bandwidth usage if you like. "Avg. work done" is over the last 10 days, and during most of those, it sounds like you did no work because you were not attached to the project, so sort of a meaningless number hours after you signup. Rosetta Moderator: Mod.Sense |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
Should I be concerned?Yes, you need to figure out how many accounts you have, and what they are. The account you are posting here with has no computers doing any work at all (as the linked to Account page shows), you need to log in to the project using the name & email address that you used to attach the computer to Rosetta that is presently processing work. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
Then I will reinstall boinc and try starting over with a fresh account.Or just attach to the project with the account you are using here. Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
What am I doing wrong?No idea. I have only ever used the graphical Manager. I left the command line behind a very long time ago. The other option would be instead of posting here using this present account, log off from the site, and log back on using the account the computer is using (that actually makes more sense, as the other account will have all the history of the work the computer has done, where as this account doesn't have any processing history *slaps self*). Grant Darwin NT |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
OK, from what you have posted previously, $ boinccmd --get_state ======== Projects ======== 1) ----------- name: Rosetta@home master URL: https://boinc.bakerlab.org/rosetta/ user_name: Macuilxochitl team_name: resource share: 100.000000 user_total_credit: 26253.309237 user_expavg_credit: 560.492141 host_total_credit: 6023.143093 host_expavg_credit: 560.492141 GUI URL: name: Your tasks description: View the last week or so of computational work URL: https://boinc.bakerlab.org/rosetta/results.php?userid=283434 jobs succeeded: 17 jobs failed: 58 elapsed time: 585442.120870 cross-project ID: b234b0bee793944832bb02a56190d855So work is being done, and it is earning Credit for that computer on that account. The user ID for that account is 283434 The user ID for the account you are posting with here is 2157465 So it looks like you've had an account for quite some time, for some reason you then created a new account- but your computer is still on the old account. But since you have logged in here with the new account, you can't see the computer. And when you go to check out your account using the BOINC Manager on the computer, you can't- because you are not logged in on that account. If you click on "Log out" at the top right hand corner of this page, that will log you out from this web site using the new account. If you then follow Step 2 below, that should allow you to log back in using your original account, the one that has the computer on it. Forgot your account info? You might wan to triple check everything before doing it (i just got up & i'm still not quite awake yet; it's been a looooong and tiring week), but it should get you logged in to this website using the account that your computer is on. Grant Darwin NT |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Alright, thanks, that seemed to work. I'm not sure why I have 2 accounts. When I set up BOINC on this box I tried to log in with the username and password I had on record from 12 years ago, but it rejected my password, so I asked to reset my password, but for some reason it seemed to create a second account, I'm not sure what happened, but now everything seems ducky. At least I have some confidence that my computer work is being used. I just ordered 3 case fans, so we'll see if I can begin to really do some crunching. |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
Glad you got that sorted. Now you can start hunting down what is going on with the system- it's putting out a lot of errors. Grant Darwin NT |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
I see the errors in the manager, but I wouldn't know where to look for the source. Maybe it is because I am using the proprietary Nvidia (linux) graphics driver? I guess I could try some memtest. I'm not seeing anything in the GUI Event log that flags my attention. If I were going to guess I'd say maybe the errors I was getting trying to transfer data to and fro given my marginal internet connection is to blame. I took a cursory look in my home directory but didn't see any log file to examine. Can you suggest where I might look? |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
Can you suggest where I might look?Unfortunately Linux error messages see to be on par with old DOS ones- next to useless. It's very unlikely to be related to the video driver (possible, but very unlikely). And the internet issues are also not likely to be the cause- the files were downloaded OK. While several Tasks crashed & burned straight away, others started processing and then crashed. But it is a possibility. One or 2 of the errors appear to be related to the Tasks themselves- there are issues with some of the Work Units, but all of the others are dying only on your system. A quick search shows that "process got signal 11" errors are either a problem with the programme (yet others aren't having the issues you are), or it's a hardware problem. Since you've got your account sorted out, and hopefully the internet issues sorted out, the usual suggested fix it is to Reset project (on the BONC Manager Project tab). What it does is clears out all of your local files (data and application) for the project, then re-downloads new copies, then downloads new Tasks to process. Given you have had internet issues, it is possible a file or two is corrupted & is responsible for your high error count. This should eliminate the Rosetta software/libraries/databases being at fault. If after doing that the problems still occur, then it's a case of testing RAM, making sure the CPU isn't overheating. possibly even turning off hyperthreading & see how things go- with that number of cores & threads, your present system RAM will result in some memory issues as some Tasks can require as much as 3GB of RAM. You generally need to allow for 1.3GB of RAM per core/thread in use to avoid running in to memory limitation issues. But it shouldn't result in the errors that you are seeing. It's also worth checking the rails of your power supply- if the voltages are dropping under load, that can also result in "process got signal 11" errors. Grant Darwin NT |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Well, I let my work unit finish and then Reset the project, as suggested. It has been cranking for 3.5 hours and the 'tasks failed' has remained at 82, so maybe that did the trick. I'll keep an eye on it. It could be the rocky upload may have been to blame. I stuck a USB wifi dongle on the machine and used my neighbor's much faster internet connection to download my work, which went quickly without any interruptions, so maybe it was cleaner. I've got some fans on the way, maybe it will reduce my temps from ~70c, though that doesn't seem excessive. I don't think the RAM is a limiting factor, I've never gone past 10 GB out of 16, but I guess I could stick in another 8 GB at some point if it becomes an issue. I really hope the power supply is not an issue, those suckers are expensive at the moment. Also, maybe a source of error is that my Geforce 730 GT is a refurb I got for $20. But I get the impression that the GPU isn't that important for Rosetta. Still, Nvidia is a crummy choice for Linux, and I get screen weirdness way to often. I wish I could find a cheap AMD processor and use the open source driver, but unfortunately, desktop display adapters are largely a thing of the past. Folks that are not gaming use onboard graphics, which is actual faster than this damn card anyway, and it is hard to find a decent video card for < $80 or so. |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
|
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
Man, this simultaneously sucks and blow, not unlike the two case fans I just installed to make boinc work better. Errors are ongoing, for a minute I thought I was through that, but I'm gone from 82 to 115 since I reset the project without increasing my completed units from 26. And now it looks like my communication with rosetta is mucked up again, the Transfers tab show my Download is pending. Specifically: "Download: pending (project backoff: 00:30...." This is using a wifi dongle and my neighbor's much faster wire, speedtest says: Download: 42.18 Mbit/s Oh well, at least my new blue LED fan is pretty. Guess I'll try some memtest before I give up on the project. Ah, the download just restarted and mostly went comfortably until it got to the last tiem in the Transfers tabs, then it stopped again, but after a minute it retried and now I'm cranking again. My temps went right back up to 73C. despite the new fans, but it looks like I'm using a greater percent of my CPU according to htop. Ah crap, while I was typing my CPU use dropped off again and now I'm getting: Status: Communication deferred... Oh well. I'll report back after running memtest. |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
https://imgur.com/ViVw0CA Oy. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like you are getting download errors: <core_client_version>7.16.6</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>database_357d5d93529_n_methyl.zip</file_name> <error_code>-120 (RSA key check failed for file)</error_code> <error_message>signature verification failed</error_message> </file_xfer_error> </message> ]]> Perhaps you have an anti-virus that is blocking the zip file from downloading? Rosetta Moderator: Mod.Sense |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
No, I'm using Linux, I don't use AV. But I guess I may have learned why I was getting so many failed units, if I haven't figured out my DL issues. I hadn't been able to run memtest because even though it was installed it wasn't one of my Ubuntu grub choices, I'm not sure why. Maybe I installed the system in UEFI mode, I don't know if that is a factor. But I booted from a live Debian image and was able to run memtest, and it started kicking up errors pretty quickly. I have a G.SKILL Ripjaws V Series 16GB 288-Pin DDR4 SDRAM DDR4 3200 stick and was running it at its XMP-2 profile, its rated speed, which is the rated speed of the RAM. So I set the RAM speed to 2133 MHz, the lowest speed, and it passed memtest. And I've been running it at that speed for 5 hours and have gotten no further errors. Also, for some reason I was using a little of my swap partition even though I always had plenty of reserve memory. Now I'm using 8 of 16GB of RAM, but no swap at all. After I'm finish using the machine for the day I'll reset the memory to its XMP 1 profile (which is probably ~2933 MHz or so) and run some memtest on it. If it is stable maybe I'll try pushing it up just a little bit. I'm not sure how much memory speed affects BOINC crunching speed. I'm just relieved that it doesn't look like my PSU is at fault, that would have been expensive to fix. |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
I dialed the memory timings down from 3200MHz to 2933MHz, which seems like the maximum I can squeeze out of this stick and still pass a round of memtest. But I'm still getting a few errors. Over maybe 10 hours I've gone from 133 total errors to 136. How bad is that? Are errors to be expected or do any errors indicate a serious issue? Maybe I should dial the RAM down to 2800 or try to RMA the stick? |
![]() Send message Joined: 28 Mar 20 Posts: 1762 Credit: 18,534,891 RAC: 176 |
How bad is that?Extremely bad. You should not get any errors. However there will be some tasks that are cancelled by the project that will be classed as an error, and there will be some tasks that do error out. Actual computation errors (unless there are a batch of bad Work Units) should be 2% or less of you Total Task number. So you should have no more than 2 Computation errors for that system. Some of your errors are related to the download issues, but the others are computation related and show memory problems (or data corruption). Maybe I should dial the RAM down to 2800 or try to RMA the stick?You need to revert your CPU & memory clocks and voltages to stock values. Computation Errors show that the overclock is not stable. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2207 Credit: 42,134,903 RAC: 21,285 ![]() |
I dialed the memory timings down from 3200MHz to 2933MHz, which seems like the maximum I can squeeze out of this stick and still pass a round of memtest. But I'm still getting a few errors. Over maybe 10 hours I've gone from 133 total errors to 136. I didn't understand the relevance of this earlier in the thread, so I didn't want to interfere, but I just looked up your CPU and it says it can't access RAM faster than 2667. Obviously it has been, with errors, but it sounds like a good idea to step it down until it's fully successful. There's often a margin, so 2800 is worth a try next. RMA might be tricky if it's only failing at a speed you already know your CPU can't handle in the first place. The other thing is you have a 6/12-core processor with 16Gb of memory. With the project's RAM demands recently, you'll struggle to run more cores without more. You mentioned you could add another 8Gb - that sounds like a good idea too Also, ensure you have the latest BIOS. Some updates improve the stability of higher speed RAM. ![]() ![]() |
Macuilxochitl Send message Joined: 11 Oct 08 Posts: 13 Credit: 134,700 RAC: 0 |
My $85 Ryzen 5 1600 AF can handle higher RAM speeds even though AMD rather conservatively says that its rated speed is only 2667MHz. From what I've read the motherboard is more of a constraint than the CPU, at least up to about 3200MHz. My motherboard's QVL list mentions many kits that have been tested to run substantially faster than 2667MHz. https://www.asrock.com/mb/AMD/B450M%20Pro4/index.us.asp#Memory Of the 290 RAM kits that ASRock tested, 61 of them were rated at 3000 or better and none of them tested as running slower than 2933MHz, and all of the 29 3200MHz kits apparently ran at their rated speeds, and the 7 tested '2933MHz' sets also tested running at their rated speeds. I do so love playing with spreadsheets! I reclocked my memory down to 2800MHz and have not gotten any additional errors over the last few days, running maybe 6-8 hours a day, so I guess that is where I'll stay. I am a bit disappointed that my memory does so much worse than all the other relatively fast sticks tested by ASRock, but probably it won't hurt my folding unduly. I'm not about to overclock my CPU, with the stock AMD processor fan I'm hitting rather high temps (80C) even at the rated default clock speed of 3200MHz (max burst speed is apparently 3700MHz without overclocking, but I've never seen the processor go faster than 3500MHz). On hot days I even reduce the CPU limits in BOINC preferences to keep the machine from overheating. Given my unimpressive performance I apparently wasn't too lucky in the hardware lottery, but what the heck, I built the system for about $300 and tax, if you don't count the case and power supply I recycled from an old Athlon XP 1700+ build. I'm only using 9GB of RAM now, and have never seen it go over 10GB on this (or any) machine, so I have 5.6GB in the bank, but if I ever see the RAM usage go over 12GB I'll order another stick, RAM prices seem to be falling at the moment after climbing for a few months. |
Questions and Answers :
Unix/Linux :
boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over
©2025 University of Washington
https://www.bakerlab.org