Work Unit Compression discussion

Message boards : Number crunching : Work Unit Compression discussion

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 9776 - Posted: 25 Jan 2006, 7:06:44 UTC

It had to happen, Nite Owl who has crunched DC medical projects since at least 2001, with his farm of 38 machines has now quit DC altogether. Mainly 'cos he want to do ONLY Rosetta and his ISP has imposed a bandwidth limit on him which stops him. He is in the boondocks and on a satellite download.
Please try and get the WU compression as soon as other work permits and maybe we can get him back.
Please do not litter this thread with "Boinc is multi project" etc as he is not interested, we have already tried him on this and we do know about it - see my stats!
If you can keep us up to date about progress in here I can make sure he knows....thanks.



ID: 9776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 9793 - Posted: 25 Jan 2006, 8:36:03 UTC
Last modified: 25 Jan 2006, 15:38:19 UTC

At the request of a user this thread was started as a place for people to discuss issues related to movement of large Work Units, and possible changes to the compression or work unit size to solve the problem for users with low, time limited, rate structure limited, restricted or rationed internet connections.

THis should provide a place for the project team to comment on the issue as they work on the problem.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 9793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Scribe
Avatar

Send message
Joined: 2 Nov 05
Posts: 284
Credit: 157,359
RAC: 0
Message 9794 - Posted: 25 Jan 2006, 8:46:47 UTC

Thanks Mod No. 9
ID: 9794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 9825 - Posted: 25 Jan 2006, 16:59:15 UTC

To save everyone a lot of retyping i'll post this link here to previous descussion and ideas on Bandwidth usage.

Feedback, .. bandwidth usage :-(
Team mauisun.org
ID: 9825 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile carl.h
Avatar

Send message
Joined: 28 Dec 05
Posts: 555
Credit: 183,449
RAC: 0
Message 9920 - Posted: 26 Jan 2006, 13:33:08 UTC
Last modified: 26 Jan 2006, 13:33:24 UTC

Copied and pasted from our forum My current viewpoint

After reading through (struggling) Fluffy Chickens thread on Rosetta.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=756
..I can now see we`ve already had an attempt at cutting down bandwidth.

The wu used to fold 10 times.....nstruct =10..now it appears in some cases to be nstruct =40.......this in effect means same download but 4 x the length of time.....4 x the amount of work by one machine.

They are also experimenting with different compression and also thinking of one larger file every so often with smaller wu`s to go with it.

David Baker and his team are very aware of the problem and actively looking into it and various ways of solving it.

Of course there also maybe the chance with longer units that it maybe a lot longer into the unit that you suddenly get an error and abort.



Not all Czech`s bounce but I`d like to try with Barbar ;-)

Make no mistake This IS the TEDDIES TEAM.
ID: 9920 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile carl.h
Avatar

Send message
Joined: 28 Dec 05
Posts: 555
Credit: 183,449
RAC: 0
Message 10025 - Posted: 27 Jan 2006, 14:43:20 UTC

I maybe wrong with my assumption previously as the WU`s appear to have gone back to a 2 hour 20 finish.
Not all Czech`s bounce but I`d like to try with Barbar ;-)

Make no mistake This IS the TEDDIES TEAM.
ID: 10025 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Moderator9
Volunteer moderator

Send message
Joined: 22 Jan 06
Posts: 1014
Credit: 0
RAC: 0
Message 10080 - Posted: 28 Jan 2006, 4:09:01 UTC - in response to Message 10025.  
Last modified: 28 Jan 2006, 4:11:51 UTC

I maybe wrong with my assumption previously as the WU`s appear to have gone back to a 2 hour 20 finish.


Carl.h

Dr. Baker has posted a news item on the Home page news area just today (27/1/06). While it does not speak directly to the bandwidth issue, you can gather from what it does say, that the problem should be reduced. Also the Max Time failures should diminish as well.
Moderator9
ROSETTA@home FAQ
Moderator Contact
ID: 10080 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R/B

Send message
Joined: 8 Dec 05
Posts: 195
Credit: 28,095
RAC: 0
Message 10277 - Posted: 31 Jan 2006, 19:34:22 UTC

This would seem to explain that of the 7 wu's I d/l on one of my machines today I have 6 with a shortened 1 week deadline and 1 with a deadline of 4 weeks labeled 'INCREASE_CYCLES_10_dtj1_287_583'.
Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers.


ID: 10277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
truckpuller

Send message
Joined: 5 Nov 05
Posts: 40
Credit: 229,134
RAC: 0
Message 10298 - Posted: 1 Feb 2006, 2:51:11 UTC

Here as late i have had to drop 2 computers from running rosetta because of band width also ((being on dial-up @ 32Kbps ))just not enough time to be able to download enough jobs to keep all my computers running. I have left my internet connection on all night to get up 8 hours later and jobs are still downloading. I do know that there are a lot of people out running dial-up also.
Visit us at Christianboards.org
ID: 10298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jeff Gilchrist

Send message
Joined: 7 Oct 05
Posts: 33
Credit: 2,398,990
RAC: 0
Message 10411 - Posted: 3 Feb 2006, 15:26:06 UTC
Last modified: 3 Feb 2006, 15:27:18 UTC

I'm assuming it's the Rosetta core that handles the .gz input files and not the BOINC client, so switching from GZIP to using BZIP2 (http://www.bzip.org/) seems would save a lot of space for tranferring data. A bzip2 library is available as free open source and works very similarly to the gzip library that Rosetta would be currently using.

Taking a few files my client is current working on:

aa1ogw_09_05.400_v1_3.gz is 3657522 bytes
bb1bm8_09_05.200_v1_3.gz is 2425482 bytes
bb1iibA09_05.200_v1_3.gz is 2492662 bytes

The same files compressed with bzip2 are:

aa1ogw_09_05.400_v1_3.bz2 is 2224708 bytes
bb1bm8_09_05.200_v1_3.bz2 is 1481306 bytes
bb1iibA09_05.200_v1_3.bz2 is 1488119 bytes

So you can see it would make a big difference in bandwidth to switch over to using bzip2 to compress the data files from gzip.

Jeff.

ID: 10411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 10420 - Posted: 3 Feb 2006, 17:20:54 UTC
Last modified: 3 Feb 2006, 17:23:21 UTC

I was assuming (from what the previous discussions) that it is the BOINC client that does the gzipping and Rosetta would need to implement their own (although I could well be wrong :-D)

From memory they where going to look into BZIP2 (due to the similarity wit GZIP)
also good old ZIP (as Climateprediction use that)
7ZIP was also talked about (open souse as well)
RAR, not open source but nice chaps and an the decompressor is free to use AFAIK.

Around 50% (or 2x) compression for all options was seen.


The layout of the files has also been stated as a way to help compression further.
Team mauisun.org
ID: 10420 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Johnathon

Send message
Joined: 5 Nov 05
Posts: 120
Credit: 138,226
RAC: 0
Message 10455 - Posted: 4 Feb 2006, 15:05:26 UTC - in response to Message 10420.  

I'm having to think about shutting down rossetta now... I just cant afford to keep on spending my money on call costs on dialup, even on off peak rates.

ID: 10455 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 10788 - Posted: 15 Feb 2006, 21:32:28 UTC - in response to Message 10455.  
Last modified: 15 Feb 2006, 21:46:55 UTC

I'm having to think about shutting down rossetta now... I just cant afford to keep on spending my money on call costs on dialup, even on off peak rates.


On Brazil "Telemar" lauched a plan "Internet sem limites" (ilimited internet)
at fixed rate R$ 30.00/month

This plan arrived into a good time. -:)
*last month my telephone bill surpassed R$ 400.00 --(: I was about to left too

However speed is still limited to dialup speeds 14-40 kbps
cause hundreds of miles of deteriored telephone cables.

*must explict adhere to this plan or continue being billed by impulse,
R$ 0.14 each 4 minutes out-off 00:00-06:00
*valid for any ISP wich dialup number begins 1500-nnnn

At my connection speed,
IF (WU size) does not reduce, even connected 24 hours/day
I will not able to keep my pc busy -:(

I believe the best is bzip2 cause it compress more than gzip
it is free, and run in all OS(s)

7zip compress even more, but consumes too much cpu -:(

Other thing that can be done is supressing from WUs some files not essential
for "crunching" , at users option

eg: xxxx.pbd -> used by the screen saver to show "native fold"
Sure, may be, there are other files that can be supressed too.
Click signature for global team stats
ID: 10788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 10791 - Posted: 15 Feb 2006, 22:24:55 UTC

Carlos, as David Baker posted yesterday:

David Kim has a very nice fix for all of the work unit time related problems. The new app will have a default target run time of 8 hours, and this rather than -nstruct will determine how many structures are generated per work unit. You will be able to change this target run time to fit your individual preferences--dial up users may wish to make this somewhat longer to reduce traffic still more.


So you just need to wait a few more days, then set R to work for e.g. 24 or 48 or 72 CPU hours (depending on your resource share, so you can meet the 1 week deadline) on the SAME WU, so you won't have to download a new WU every 3 hours.

We'll need to experiment with this new feature, but I think people with big farms and traffic quotas (like NightOwl) might want to increase it quickly to a high number, e.g. 72hrs, so it'll crunch on the same WU for 3 full days (and still have plenty of time to meet the 7 day deadline). That should drop traffic to 30MBytes/month/P4.

This was indeed the best possible solution. Compression will be a welcome, but minor bonus. The use of bzip would be nice, but insignificant compared to the new feature imho.

Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 10791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Carlos_Pfitzner
Avatar

Send message
Joined: 22 Dec 05
Posts: 71
Credit: 138,867
RAC: 0
Message 10846 - Posted: 17 Feb 2006, 14:14:43 UTC
Last modified: 17 Feb 2006, 14:21:19 UTC

This was indeed the best possible solution. Compression will be a welcome, but minor bonus. The use of bzip would be nice, but insignificant compared to the new feature imho.


Some posts backward into this thread Jeff Gilchrist

showed that bzip2 vs gzip do a effective reduction of about 50%
on the size of some WUs, he picked to do this test ...

Remains only the psycologic interpretation of "insignificant"

For me, 50% size down is a very significant size/network traffic reduction,
may be that on your interpretation, 50% means insignificant ?

ps: the use of bzip2 , does not preclude the new feature.

Why not implementing both at same time ?
Click signature for global team stats
ID: 10846 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 10855 - Posted: 17 Feb 2006, 20:25:47 UTC

If we're limited to 24 hours (and hopefully we'll be able to ask for 3-6 days of jobs for WUs) we'll be able to download just 1 WU instead of 8ea 3 hour WUs, or 48ea 30 minute WUs (the ones that pounded the server into submission earlier this week). 1/8th to 1/48th of the download bandwidth is a much more significant reduction than a mere 1/2 of the download bandwidth. (And the value decreases even further for each additional day we're allowed to add to a WU we download.)

However, as Carlos implied, reducing the bandwidth usage to half by switching to a much better compression algorithm is yet another quick and easy way of reducing the time we spend communicating with the server.

As the project heads have mentioned an interest in compressing the WU downloads using one of the open source applications we've suggested, we'll see it incorporated into the client - hopefully in the near future.
ID: 10855 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Johnathon

Send message
Joined: 5 Nov 05
Posts: 120
Credit: 138,226
RAC: 0
Message 10912 - Posted: 18 Feb 2006, 22:24:06 UTC


ID: 10912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dimitris Hatzopoulos

Send message
Joined: 5 Jan 06
Posts: 336
Credit: 80,939
RAC: 0
Message 10913 - Posted: 18 Feb 2006, 22:35:10 UTC
Last modified: 18 Feb 2006, 22:48:45 UTC

Carlos, I 100% AGREE with you on bzip2 vs gzip. Also bzip2 lib is afaik a plug-in replacement for gzip under Linux and Win, so implementing it should be real easy.

What I meant is that bzip-vs-gzip will at best reduce bandwidth to to 1/2 of current bandwidth, whereas the new crunch-wu-for-X-hrs feature can reduce bandwidth to 1/30th or more (if you have your PC on 24/7 and set it to crunch R WUs for 4-CPU-days, i.e. 345600sec and still meet even the smallest 7-calendar day R@H deadline).

If both "crunch-for-x-hrs" and bzip2, it'd reduce bandwidth requirements to 1/50th or 1/60th or current levels. Obviously, the less bandwidth the better (I can sympathise with dialup guys, many years ago during the BBS + early internet days, I used to download international e-mail at speed of 2400bps, i.e. 230 bytes per second, paying $$$/min to the monopoly telecom, some of it uncompressed -MNP5 or V.32bis LAP/M weren't supported by everyone back then).
Best UFO Resources
Wikipedia R@h
How-To: Join Distributed Computing projects that benefit humanity
ID: 10913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
truckpuller

Send message
Joined: 5 Nov 05
Posts: 40
Credit: 229,134
RAC: 0
Message 11068 - Posted: 21 Feb 2006, 5:11:25 UTC

I have noticed larger uploads and downloads i have 1 download waiting that is 6.5MB and im on dial-up@ 32Kbps and have had uploads of 250Kb- 750Kb so this is very hard for me to keep 3 computers rumming at all times. I have left internet connection on all night to get up 8 hours later and still downloading jobs. I know im not the only 1 here on dial-up .
Visit us at Christianboards.org
ID: 11068 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R/B

Send message
Joined: 8 Dec 05
Posts: 195
Credit: 28,095
RAC: 0
Message 11097 - Posted: 21 Feb 2006, 11:25:29 UTC

If I were to adjust my preferences down for a 4 or 6 hour target to completion per work unit would I be able to keep my 4 week deadline? One of my machines is not operating very often and one work unit just missed the deadline. This, however, was a 1 week deadline work unit though. Thanks.
Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers.


ID: 11097 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Work Unit Compression discussion



©2024 University of Washington
https://www.bakerlab.org