How's the Project Production Shaping Up (3)?

Message boards : Number crunching : How's the Project Production Shaping Up (3)?

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26857 - Posted: 15 Sep 2006, 19:58:42 UTC

Stay on topic, be polite, be constructive.


Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26857 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26858 - Posted: 15 Sep 2006, 19:59:53 UTC

From Alan Roberts on Sept 11th:
So I'd like to ask an analog to mmciastro's question ... How is the rate of project work shaping up? Realizing that credits and teraflops are subject to seemingly neverending debate, I'm wondering if anyone has a time series for Successes last 24h? The home page reports the most recent number, but it doesn't have a graph for history.

I understand that work units are highly variable, but it seems that: We've got a lot of CASP follow-on work (similar populations of jobs) and a large sample (>100K jobs) that we should be able to compare to the time during CASP7, and see if we are recovering from the loss of some participants.


Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26858 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26859 - Posted: 15 Sep 2006, 20:01:11 UTC

From Tralala on Sept 11th:
So I'd like to ask an analog to mmciastro's question ... How is the rate of project work shaping up? Realizing that credits and teraflops are subject to seemingly neverending debate, I'm wondering if anyone has a time series for Successes last 24h? The home page reports the most recent number, but it doesn't have a graph for history.

I understand that work units are highly variable, but it seems that: We've got a lot of CASP follow-on work (similar populations of jobs) and a large sample (>100K jobs) that we should be able to compare to the time during CASP7, and see if we are recovering from the loss of some participants.


I have not followed the number of successes closely but what I remember is, that we had for the most time during CASP success numbers between 140.000 and 150.000. After the transition to the new credit system it was around 140.000 all the time. The past day we saw an unusual climb to 170.000 but this is due to the fact, that seti is out of work and should normalize after seti sends out work again (probably even taking a slight dunk since many hosts will repay their debt to seti).

The number of successes is not really a reliable measure either since participants can change the target time which affects the number of successes, if I increase the time I will report less results although doing the same science. However averaged over all participants this number is probably good enough since it is unlikely a signifant number of participants will change their run time preference in the same direction simultaneously.

My own guess is, we got down from almost 150.000 results/day to about 140.000 after the pullout of some partipants, which is a loss of about 7%. Otoh we saw above average new hosts/day after the invention of the new credit system and I think after the seti spike we will again settle around 150.000 successes/day.

Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26859 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26860 - Posted: 15 Sep 2006, 20:03:04 UTC

From Mike Gelvin on Sept 12th:
So I'd like to ask an analog to mmciastro's question ... How is the rate of project work shaping up? Realizing that credits and teraflops are subject to seemingly neverending debate, I'm wondering if anyone has a time series for Successes last 24h? The home page reports the most recent number, but it doesn't have a graph for history.

I understand that work units are highly variable, but it seems that: We've got a lot of CASP follow-on work (similar populations of jobs) and a large sample (>100K jobs) that we should be able to compare to the time during CASP7, and see if we are recovering from the loss of some participants.


I have not followed the number of successes closely but what I remember is, that we had for the most time during CASP success numbers between 140.000 and 150.000. After the transition to the new credit system it was around 140.000 all the time. The past day we saw an unusual climb to 170.000 but this is due to the fact, that seti is out of work and should normalize after seti sends out work again (probably even taking a slight dunk since many hosts will repay their debt to seti).

The number of successes is not really a reliable measure either since participants can change the target time which affects the number of successes, if I increase the time I will report less results although doing the same science. However averaged over all participants this number is probably good enough since it is unlikely a signifant number of participants will change their run time preference in the same direction simultaneously.

My own guess is, we got down from almost 150.000 results/day to about 140.000 after the pullout of some partipants, which is a loss of about 7%. Otoh we saw above average new hosts/day after the invention of the new credit system and I think after the seti spike we will again settle around 150.000 successes/day.



Is number of successes the number of decoys? Or the number of workunits? Does anyone know?

Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26860 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26861 - Posted: 15 Sep 2006, 20:04:25 UTC

From Feet1st on Sept 12th:
Is number of successes the number of decoys? Or the number of workunits? Does anyone know?

I'm pretty sure it's just what they inherited from BOINC, which would be number of reported WUs... not models/decoys. So... really not a meaningfull number for Rosetta with flexible runtime preference.

Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26862 - Posted: 15 Sep 2006, 20:05:37 UTC

From Alan Roberts on Sept 13th:
So... really not a meaningfull number for Rosetta with flexible runtime preference.


Feet1st, I'll grant the potential inaccuracies, but I thought that the large (24 hour) bin resulted in an aggregate of a very large number of samples (always >100K results while I've been crunching). If the assumption that the nature of WUs hasn't changed much during an interval (i.e., CASP to CASP follow-up) is true, and only a small percentage net change in runtime preference has happened during the interval, then it seems you could look at the trend across that interval as a proxy for increasing/decreasing "science oriented" production.

I would like to see the home page to list decoys produced during the last 24 hours as well, if that is possible. That should eliminate change in runtime preference, providing an even cleaner science-oriented metric, correct?

I appreciate that the values for decoys per day could shift as the overall workload changes (presumably the long-term trend is lower for a fixed amount of crunching horsepower as models for larger and larger proteins are attempted). More computers, faster computers, and faster code would push decoys per day up over time?

Mostly I was looking for some way of characterizing short to medium-term changes in our contribution that avoided the Forever Credit/Teraflop War, since that seems subject to vast debate.

Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26862 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26863 - Posted: 15 Sep 2006, 20:07:22 UTC

From Feet1st on Sept 13th:
I'll grant the potential inaccuracies, but I thought that the large (24 hour) bin resulted in an aggregate of a very large number of samples (always >100K results while I've been crunching). If the assumption that the nature of WUs hasn't changed much during an interval (i.e., CASP to CASP follow-up) is true, and only a small percentage net change in runtime preference has happened during the interval, then it seems you could look at the trend across that interval as a proxy for increasing/decreasing "science oriented" production.


Really the best measure of science work performed is TeraFlops. Both before and after the new Rosetta credit system.

I would like to see the home page to list decoys produced during the last 24 hours as well, if that is possible. That should eliminate change in runtime preference, providing an even cleaner science-oriented metric, correct?

Correct, but it doesn't account for how one WU can take 2 hrs to crunch a single model, and another WU can crunch 20 models in 2hrs. So, still best to use TFLOPS as your measure.

I appreciate that the values for decoys per day could shift as the overall workload changes (presumably the long-term trend is lower for a fixed amount of crunching horsepower as models for larger and larger proteins are attempted). More computers, faster computers, and faster code would push decoys per day up over time?

There can be several different workloads running at the same time. In fact that is the case presently. Wide varience in time per model.

Mostly I was looking for some way of characterizing short to medium-term changes in our contribution that avoided the Forever Credit/Teraflop War, since that seems subject to vast debate.


I tend to use this chart. Divide by 100,000 to get TFLOPS. Because it measures actual work done each day, rather then number of hosts (which may only have a small fraction of resource share to Rosetta) or number of users (which may be the same people with different EMail addresses). It's not a perfect measure, but I believe it is the best we have available.


Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 26944 - Posted: 16 Sep 2006, 11:13:31 UTC
Last modified: 16 Sep 2006, 12:13:20 UTC

Stay on topic, be polite, be constructive.
One more try at staying on the topic of How the Project Production is Shaping Up. If it's not about How the Project Production is Shaping Up, then it will be hidden. If it's 4 paragraphs of rant with 1 sentence applicable to the subject and we have time, we may post a quote of your message with the one applicable sentence. Feel free to save us the trouble by posting just the applicable, on topic, polite and contructive parts of your messages.

This is not a thread for baiting or insulting, as the last two versions turned into. This is not a thread about how great other DC projects are, nor is it a thread about why various people may or may not have left Rosetta. It's about How the Project Production is Shaping Up.

Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 26944 · Rating: 0.99999999999999 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dag
Avatar

Send message
Joined: 16 Dec 05
Posts: 106
Credit: 1,000,020
RAC: 0
Message 26983 - Posted: 16 Sep 2006, 19:50:26 UTC - in response to Message 26857.  

Stay on topic, be polite, be constructive.

Could I have a thread with nothing but my posts too?
dag
--Finding aliens is cool, but understanding the structure of proteins is useful.
ID: 26983 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Saenger
Avatar

Send message
Joined: 19 Sep 05
Posts: 271
Credit: 824,883
RAC: 0
Message 27002 - Posted: 16 Sep 2006, 22:14:25 UTC - in response to Message 26983.  

Stay on topic, be polite, be constructive.

Could I have a thread with nothing but my posts too?

You have to be fast ;)
ID: 27002 · Rating: -0.99999999999999 · rate: Rate + / Rate - Report as offensive    Reply Quote
mage492

Send message
Joined: 12 Apr 06
Posts: 48
Credit: 17,966
RAC: 0
Message 27222 - Posted: 18 Sep 2006, 0:00:41 UTC

Regarding production, Macs might throw the measurements off significantly.

For example, my Mac is producing the same as before. Overall project RAC would have a fair drop because Macs aren't counting for as much, anymore (On the other hand, Linux boxes would earn more. From my two computers, though, Macs lost more than Linux gained.).

Out of curiosity, how does one get their hands on the XML output, for the project (like what BoincStats gets, for example)? If we can see the WU times, processor info, etc., that might make "number of successes" more meaningful.

What I think we need is for someone like... I'm having a mental block, but it's that guy who made all those really neat spreadsheets, a little while back. We would need someone to take a good-sized sampling of that info (WU times, proc. info, etc.), to find out some kind of average. Then, we could "multiply" that average by the number of successes, which would give us a fair estimate.

I hope I explained that clearly...
"There are obviously many things which we do not understand, and may never be able to."
Leela (From the Mac game "Marathon", released 1995)
ID: 27222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 27223 - Posted: 18 Sep 2006, 0:15:18 UTC - in response to Message 27222.  
Last modified: 18 Sep 2006, 0:15:41 UTC

Regarding production, Macs might throw the measurements off significantly.

For example, my Mac is producing the same as before. Overall project RAC would have a fair drop because Macs aren't counting for as much, anymore (On the other hand, Linux boxes would earn more. From my two computers, though, Macs lost more than Linux gained.).

Out of curiosity, how does one get their hands on the XML output, for the project (like what BoincStats gets, for example)? If we can see the WU times, processor info, etc., that might make "number of successes" more meaningful.

What I think we need is for someone like... I'm having a mental block, but it's that guy who made all those really neat spreadsheets, a little while back. We would need someone to take a good-sized sampling of that info (WU times, proc. info, etc.), to find out some kind of average. Then, we could "multiply" that average by the number of successes, which would give us a fair estimate.

I hope I explained that clearly...


The only stats xml dumps I know of are located on the Rosetta servers at https://boinc.bakerlab.org/stats/. Unfortunately it only gives "user and host" data. All the data regarding the other projects and rosetta such as WUs, Credit (claimed/granted), and most other data were manually scrapped from my "results page". I believe Boincstats collects credit data daily or so and keeps them recorded and compares them for data output. They don't show individual WUs per user. It takes me about 1 1/2 hours/day to just keep up on getting the data before it's purged/deleted from the DB.

tony
ID: 27223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mage492

Send message
Joined: 12 Apr 06
Posts: 48
Credit: 17,966
RAC: 0
Message 27224 - Posted: 18 Sep 2006, 0:21:31 UTC - in response to Message 27223.  
Last modified: 18 Sep 2006, 0:22:17 UTC

Okay, thanks for the link. Too bad it doesn't really give us enough to work with, though.


It takes me about 1 1/2 hours/day to just keep up on getting the data before it's purged/deleted from the DB.

tony


Wow, I knew it took a bit of work, but I had no idea it took that long! Well, the hard work is certainly appreciated (I'm fascinated by statistics.).

Edit: BBCode fix
"There are obviously many things which we do not understand, and may never be able to."
Leela (From the Mac game "Marathon", released 1995)
ID: 27224 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 27226 - Posted: 18 Sep 2006, 0:23:33 UTC
Last modified: 18 Sep 2006, 0:31:33 UTC

The "user.gz" contains:

/user/country /user/cpid /user/create_time /user/create_time/#agg /user/expavg_credit /user/expavg_credit/#agg

/user/expavg_time /user/expavg_time/#agg /user/has_profile /user/id /user/id/#agg /user/name /user/teamid

/user/teamid/#agg /user/total_credit /user/total_credit/#agg /user/url

In english that's user info on:
Country, Cpuid, Create time(user join date), Aggregate create time, RAC, Aggregate RAC, Avg time, Avg time aggregate, has profile, userID aggregateuserID, aggregate userID, Username, User teamID, User teamid aggregate, Total credit, Aggregate total credit, and user URL.

ID: 27226 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mage492

Send message
Joined: 12 Apr 06
Posts: 48
Credit: 17,966
RAC: 0
Message 27227 - Posted: 18 Sep 2006, 0:36:08 UTC

I think I saw it mentioned, somewhere, that a fair number of people who were using clients other than the standard one are switching to the standard client. This would mean that the total amount everyone is claiming for the project would go down. Unless there was some way to see what client was used, pure credit numbers really don't help...

If we're not too concerned about being exact, maybe we could come up with a "fudge-factor", to try to account for this? Then, we could go based on the credit numbers from the XML. Again, this wouldn't be exact, but it might give us something.

Beyond that, I really can't think of any other way, off the top of my head...
"There are obviously many things which we do not understand, and may never be able to."
Leela (From the Mac game "Marathon", released 1995)
ID: 27227 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BennyRop

Send message
Joined: 17 Dec 05
Posts: 555
Credit: 140,800
RAC: 0
Message 27241 - Posted: 18 Sep 2006, 3:06:40 UTC - in response to Message 27227.  

I think I saw it mentioned, somewhere, that a fair number of people who were using clients other than the standard one are switching to the standard client. This would mean that the total amount everyone is claiming for the project would go down. Unless there was some way to see what client was used, pure credit numbers really don't help...

Was that a message from a XS member saying their team on WCG was switching to standard clients?
I've never seen a pie chart of active systems vs what Boinc client they're running. Without being able to tell how many of the 60k? are running optimized clients, you won't be able to say how much of a change will be made by them switching to the standard client. If 5% were using an optimized client that gave 3x the standard client and switched to the standard client, we'd go from ((95*1.0)+(5*3)) or 110 to 100 with the standard client. If 10% were using a 3x client, and moved to the standard client, we'd go from 120 to 100 with the standard client. But as it's unknown how many active optimized client users there are, their rate of conversion, and a number of other issues, it's hard to tell what effect on our running averaged new credit scheme they'll have by switching to a standard client.
There's a chart in one of the threads that shows Rosetta contributors by OS and by CPU type. Mac users were around 5%, Linux users were around 5%, and around 65% were using Intel cpus. Macs are boosting the credit/model value, but will get lower production per day until their Rosetta app gets optimized for their cpu.
ID: 27241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Whl.

Send message
Joined: 29 Dec 05
Posts: 203
Credit: 275,802
RAC: 0
Message 27257 - Posted: 18 Sep 2006, 8:23:21 UTC

Benny, I posted that about changing to the standard client, but my post was deleted.
ID: 27257 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Tymbrimi
Volunteer moderator
Avatar

Send message
Joined: 22 Aug 06
Posts: 148
Credit: 153
RAC: 0
Message 27267 - Posted: 18 Sep 2006, 8:59:46 UTC

Posted by Whl on Sept 13th:
[quote]I think you will find a lot who have left have gone to WCG which operates a quorum of 3 for the Help Defeat Cancer project, Fight Aids At Home and the soon to be restarted higher resolution Human Proteome Folding 2 project (which will help Dr Baker and his team BTW, without any of us having to be here). The majority have also changed or are in the process of changing to the standard client as well, so there can be no allegations of cheating.

That is all I am going to say and I dont want to restart anything here, as it is not fair to the existing and new members. I dont like what happened here, but what is done is done, whether I like it or not. Nothing is going to change now.
[quote]

I award you 3 devious points, Whl. <grin> This was not deleted. And I believe I asked you to repost it.
Rosetta Moderator: Mod.Tymbrimi
ROSETTA@home FAQ
Moderator Contact
ID: 27267 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Whl.

Send message
Joined: 29 Dec 05
Posts: 203
Credit: 275,802
RAC: 0
Message 27268 - Posted: 18 Sep 2006, 9:03:29 UTC

It was not deleted, but it was hidden. I dont recall you asking me to repost it ?
ID: 27268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 27271 - Posted: 18 Sep 2006, 9:10:31 UTC - in response to Message 27268.  

It was not deleted, but it was hidden. I dont recall you asking me to repost it ?


And given the heavy handed way those threads have been dealt you, you will not have a way to prove he didnt.
ID: 27271 · Rating: -4 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : How's the Project Production Shaping Up (3)?



©2024 University of Washington
https://www.bakerlab.org