Thursday, January 9, 2003
Happy New Year to all. Some quick updates on the work we've been doing over the last few months, and what 2003 holds in store for the Genome@home project.
i) You can all be very proud to say that you have contributed to three major scientific publications, with a fourth now under review. The details of the scientific contributions stemming from the Genome@home research can be found here.
ii) I presented a portion of our most recent work (stemming from this paper) at the CASP5 meeting on protein structure prediction. Many people were excited by our novel method for identifying structural templates using our designed sequences, and we're eager to continue this work further.
iii) The development of our new G@h2 core for the Folding@home project is almost complete and we will begin beta-testing next week. The new functionality in this algorithm will allow us to to explore the design of much larger proteins, as well as design only portions of proteins, or design complexes of proteins. We aim to release this core by the end of January. The first set of proteins that we'll be designing with this new core is a large diverse set of proteins in the range of 100 to 200 amino acids.
iv) Our collaboration with an experimental research group here at Stanford is well under way and we're looking forward to getting some initial results in the next few weeks.
Thursday, September 5, 2002
After a hectic summer, we're back in the full swing of things. A few pieces of news:
i) After almost a full year of reviews and revisions, our first G@h paper is officially accepted for publication! This is a great step forward for the project and the first of many successful papers to come.
"Thoroughly sampling sequence space: large-scale protein design of structural ensembles." Stefan M. Larson, Jeremy L. England, John R. Desjarlais, & Vijay S. Pande. (2002) Protein Science, in press
I'll write up a layman's summary of this paper over the weekend and post it on the website. This is a very solid piece of original scientific work and you should all be proud for having contributed to it. We also have another paper under review at the moment. Finally, G@h and F@h will be the subject of a chapter in an upcoming book:
"Folding@Home and Genome@Home: Using distributed computing to tackle previously intractable problems in computational biology." Stefan M. Larson, Christopher D. Snow, Michael Shirts, and Vijay S. Pande. To appear in Computational Genomics, Richard Grant, editor, Horizon Press, (2002)
ii) I presented our latest work at the 16th Symposium of the Protein Society in San Diego. The work generated a lot of interest, and I had some great discussions with other researchers in the field of protein design. I even won an award for "Best Poster"! You'll all shortly be receiving your share of the $300 prize ;-)
iii) The inclusion of the G@h algorithm in the Folding@home3 client has gone quite well. There are still some bugs that appear on certain systems with larger work units. We've been having trouble replicating these locally, but the bug reports from users are helping us track down the problem. For now, we're running smaller work units, which seem to be OK.
iv) We are in the early stages of developing an exciting new project in collaboration with a structural biology research group here at Stanford. More info as we firm up the details and get some initial results.
v) The stats problems that were plaguing us last week seem to have been remedied by a number of minor "bulletproofing" patches to various analysis scripts and such. I'll be keeping a close watch on things to be sure the servers aren't acting up.
vi) Website updates. I've removed the completed tasks from the todo list and will soon be writing up the latest list of things to do. The link to the old Yahoo forum has been updated to point to the Folding-Community forum.
That's it for now. I'll post some more stuff next week. Happy Genoming!
Friday, June 14, 2002
Folding@home3.0 released today!
The inclusion of the Genome@home algorithm into Folding@home3.0 is complete and ready to go live today. Existing Genome@home users are encouraged to upgrade to this new client.
The F@h3.0 client has the capability to run exclusively G@h workunits. All you need to do is configure the F@h3.0 client with your existing G@h team number, upon first installation. All F@h3.0 clients run with valid G@h team numbers will process exclusively G@h workunits and all completed units will be credited to the Genome@home stats system. Please note that all valid registered G@h teams are identified by team numbers greater than 100,000; G@h users without a team can enter team number 100,000 itself to remain part of the Genome@home project.
If you're happy with your current G@h client (0.99), there's no mandatory requirement to upgrade or change your configuration in any way.
Tuesday, May 21, 2002
More information about G@h Classic and F@h3.0
Monday, May 20, 2002
Genome@home 2.0 to merge with Folding@home 3.0
The Genome@home project is proud to announce that we will be combining forces with Folding@home, our sister project at the Pande Group. The Genome@home protein design algorithm (specifically the 2.0 version, which has been under development for several months now) will be included as a F@h3.0 scientific "core", which will allow F@h and G@h users to run both projects from the same client. The original Genome@home project (i.e. version 0.99, soon to be referred to as "G@h Classic") will continue as a separate project indefinitely. Current G@h users need not switch to the new F@h3.0 client, but are encouraged to do so, as very little further development will take place on G@h Classic. The vast majority of new work will take place within the new F@h3.0 architecture. User and team statistics from G@h Classic will not be integrated into F@h3.0 and vice-versa. The two will exist as separate distributed computing research projects.
More details will be provided in an extensive FAQ, coming soon.
Friday, April 19, 2002
The science behind and coming out of Genome@home is charging ahead full steam. We're submitting a second research paper next week which outlines an exciting proof-of-concept application of the Genome@home data set to protein structure prediction, a hot field in the post-genomic era. Two recent talks about Genome@home, one at Carleton University in Ottawa and another here at Stanford, were well-received. I'll be presenting our current work and some new results at the Protein Society Annual Symposium in August.
My recent focus on our scientific activities has really eaten into my time spent on G@h2.0. However, most of our new work relies on having the G@h2.0 architecture up and running, so it's getting most of my time these days. Aside from the client itself, I've been working on bulletproofing the stats system and thinking about ways to integrate our stats into the Folding@home project as well, so that users can run both projects, and look at their credits for one or the other or both. I think I've come up with a scheme that will allow this in a meaningful and straightforward way, without removing any of the functionality of the current G@h stats system or in any way penalizing users that only run G@h.
I've started some work, together with Amit Garg, on setting up an enhanced website interface to our results database, which would be useful to other scientists and protein researchers. Amit's pretty busy with course work and med school applications, so this new element might not appear until the summer.
Friday, March 8, 2002
It's been quite some time since our last update, so I'll try to summarize the news on several fronts from the last two months or so.
I presented our work at a number of scientific conferences over the last few weeks, and it was very well-received. Every one in our field is excited about Genome@home and Folding@home, and it's very satisfying for all of us to be at a stage where we can show that we've achieved real scientific advances through distributed computing. The paper we submitted is under peer review, and I hope to hear back from the journal soon. I've started putting together the data analysis for the next couple of papers, which I plan to write up and submit by the end of May.
We've also recently written two reviews about F@h and G@h; one will be appearing as a book chapter, and the other as part of a special issue of the journal "Biopolymers". Finally, I'll be giving an invited talk next Friday (March 15) at Carleton University in Ottawa, Canada on both G@h and F@h.
It's taken longer than expected (as these things always do) to code up Genome@home2.0. I've started some in-house testing of the Windows and Linux versions, but haven't yet tried to compile a version for Mac OSX. One of the reasons it's taken longer is that we've decided to merge the G@h2.0 client into the current Folding@home2.x client. The client will look just like Folding@home2.x, but the user will the option of running F@h or G@h, or both.
Aside from looking a lot cooler than a simple text-based console client, the new G@h2.0 will vastly increase the scope of research problems we can tackle. Since this client can auto-download the specific "scientific core" that is necessary to process a certain workunit, we're going to be able to test new design algorithms, design new classes of molecules, and look at a whole host of scientific questions that were out of reach with G@h0.99.
We've had some new people join the G@h team in the last while. Guha Jayachandran is a C.S. grad student who's being helping develop and test the G@h2.0 core and the merged F@h/G@h client. Vishal Vaidynathan is a Chemistry Ph.D. student who's writing up the next generation design code. Sid Elmer, one of the first members of the Pande Group, has begun some work in collaboration with G@h to design peptide-mimetic protein ligands. Last but not least, Lillian Chong is a Ph.D. student from UCSF, who will be joining our group as a post-doctoral fellow this summer.
That's all for now. Happy Genoming!
Friday, December 21, 2001
Happy Holidays from Genome@home!
It's hard to believe that the project is almost one year old. We've come a long way since our initial release in January. The scientific work behind the Genome@home project is moving at a furious pace. Our first publication was submitted a few weeks ago, and it should appear in print in a few months (the peer review process can take seemingly forever!). This a major milestone for the project and is the result of a lot of hard work from everybody involved in the project, especially all the users.
Our plans for the New Year are very exciting. January will the see the introduction of Genome@home2.0, a snazzy graphical client similar to the current Folding@home client. I'll be putting together our next scientific paper, regarding the use of the large sequence libraries generated by Genome@home for homology modeling, protein structure prediction, and genome annotation. We'll be starting some exciting work on peptide and peptide-mimetic ligands, with very real applications in drug design. Finally, I'll be analyzing the data from the SH3 and SH3-ligand projects as part of a larger collaborative effort with Folding@home to computationally simulate how proteins bind ligands, the fundamental molecular action which imparts biological function to all proteins.
Some smaller notes: I've upgraded our web server, which has been having difficulty lately. There were a number of minor problems with it, which I fixed, so it should run smoothly now. I'll be away on holiday next week, but Vijay will likely be checking in on the Yahoo discussion group in case anything comes up.
Best wishes for a Happy Holiday and a Prosperous New Year. - Stefan & The G@h Team.
Monday, November 12, 2001
Some updates on a number of issues today:
Since the change to a new stats system a few weeks ago, many users have been wondering how these "duplicate" units appear in the first place. Under the new stats system, units are considered duplicates if they are labeled with the exact same combination of project identifier, structure name and variant number, cpu id, and random seed. Units are also considered duplicate, regardless of labeling, if they contain the exact same data, down to the last bit.
Earlier development versions of the Genome@home client (keep in mind that 0.99 is still not a true release version) had various bugs that could result in generation of duplicate work units, especially if the files in the Genome@home directory were manually deleted (or inserted) or if the client was stopped and restarted at an inopportune time. Many of our users really ran the client through its paces, and tested the behaviour of these development versions in all sorts of situations, all with "our blessing". The goal of all this was, and still is, of course, to produce a rock-solid client that will happily churn away and produce valid data with 100% efficiency. This invaluable close collaboration with beta-testers has produced a client with features and checks that I alone would never have thought of in any stage of the client design.
The long and the short of it is that numerous bugs in early versions of the client (and glitches in how we ran the server) resulted in duplicate units, nothing more, nothing less.
Development of Genome@home 2.0
The Genome@home team works closely with the developers and researchers in charge of the Folding@home project. Recently, Folding@home2.0 was released, introducing a new version of their client, based on a highly upgraded architecture, including big changes on the server-side.
I've decided that it will be most efficient for me, the users, and the project, to discontinue development of the Genome@home 1.0 client series, and move directly to the Folding@home2.0 client/server architecture. Our aim is to begin beta-testing the Genome@home 2.0 client in early December, to set up for a release early in the new year. The 2.0 client will be faster and more stable, include graphical output, and will allow us much more freedom in asking and investigating scientific questions. I have a long list of bug fixes and suggestions left over from the 0.9 and 1.0 development effort, which I'll certainly include in the 2.0 development.
The Genome@home project itself will move seamlessly through the transition to the 2.0 architecture, and the stats, results, website features, etc. will all be retained and continued.
Continuing on with new scientific questions, we'll be launching a new set of proteins this week. This project includes a much greater diversity of work units, with lengths up to 150 amino acids. These longer work units will, of course, take longer to run, but since we weight the stats according to the size of the work unit, they'll be worth more points as well.
We're about to submit our first scientific paper for publication this week, based on the work that Jeremy England and I collaborated on this summer (i.e the "beta2" work units). Unfortunately, we can't discuss the details of this work until it's published, but I'll certainly post the paper here at that time, as well as writing up a layman's account of what this exciting work is telling us. The raw data, of course, is available here. The results of these analyses provide some fundamental insight into the ways that proteins behave, and could not have been done without a computational study of this scope, without the help of each and every one of the dedicated Genome@home users.
This truly fulfills the purpose of the Genome@home project: using distributed computing, we have been able to do important fundamental scientific work that simply could not have been done otherwise. Each and every one of you has contributed to furthering the world's understanding of complex biological phenomena.
Keep on crunching . . . !
Friday, November 2, 2001
One of the advantages of the Genome@home distributed computing project is that the client does not need to run on a machine with a constant internet connection. In fact, it can be run on machines with no internet connection at all. It wasn't a consideration in the original design of the client, but this capability was discovered by users early on in the development of the project, and further client development introduced some features which make the process of "nonetting" or "sneakernetting" slightly more convenient.
Since this has become a very popular and viable way to run the Genome@home client, I'd like to offer some instructions on how this can be done efficiently, without loss of processed work units. The protocol below assumes a setup consisting of one machine with an internet connection, the "net machine" and several machines without, "dummy machines". Any deviation from this protocol, or "variations on the theme", should be used with caution and careful forethought. Please note that, in general, this is a somewhat advanced technique, and you should only try nonetting if the instructions below make sense to you. There are thousands of possible variants of this, and we really can't effectively troubleshoot individual attempts or procedures.
Those familiar with nonetting will see that the order of operations in some of these steps can vary slightly. This is not the only method that will work, but it will work in most cases.
Friday, October 26, 2001
We'll be switching to a new stats system today. This stats system is more discerning in its acceptance and rejection of work. Previously, almost all work units were accepted and given stats credit, regardless of accuracy or quality. Now, we will only be accepting non-duplicate results whose contents match the various labels attached to them. Furthermore, this change will be retroactive, as described below. There has been much discussion about what to do about old results that were duplicates or mismatches, and after much analysis the following changes are being implemented:
The effects of this change are staggering for some, but incredibly minor for the vast majority of Genome@home users (median loss is roughly 1%). Please be assured that I have thought long and hard about this, and done extensive analysis of various options and their effects on the project and the user community. Further analysis can be seen here.
The main impetus for the change in the stats is to prepare the user community and our infrastructure for the launch of Genome@home 2.0 in the new year. I won't get into the details here, except to say that we'll be doing essentially the same scientific work, but with a much improved client and server architecture which will allow us to do more sophisticated computational experiments on the design of new genes and proteins. I'm going to try to make the switch to G@h 2.0 as seamless as possible for the user community, by carrying over the stats from the current phase of the project, trying to keep relative stats weightings fair, etc. The only tricky thing that users will face is that the client upgrade to 2.0 will be mandatory (believe me, it'll be worth it), and acceptance of results from older, beta versions will be slowly phased out.
Now, I know this user community well enough to anticipate that some users will feel cheated and tricked by this stats adjustment. This is obviously not my intent, and I sincerely apologize for causing any frustation. I wish there was a better way around this, but there is not. The new stats will become official this evening, and this will be the final word. I must move on and move forward with this project. We have a lot of interesting scientific findings from the first 9 months of Genome@home, and we've got lots of exciting plans for the years ahead. I'll write more in subsequent updates about our current findings and our future plans.
Thank you all most sincerely for your continued support.
Tuesday, August 21, 2001
To keep everyone abreast of the work going on here at Genome@home, I've posted a to-do list, which I hope to keep relatively up to date. The focus is currently on finishing up the data analysis and getting our first two papers submitted.
We've had some bug reports from the 0.99 client, which I'm pretty sure I've got under control. Once these papers are out, I'll have some time to code up the 1.0 version.
A fond farewell to Jeremy England, an excellent collaborator who worked with us over the summer, analysing the G@h data, looking at questions of diversity, designability, and the utility of backbone flexibility in protein design. Jeremy's gone back to Harvard to pursue his studies in the Biochemical Sciences program
That's all for now . . . don't take any wooden nickels.
Thursday, August 2, 2001
The new, less buggy client, version 0.99 is ready. If this one's OK, 1.00 is just around the corner! Download or upgrade available now. I encourage everyone to suggest changes and bug fixes in the forum.
Tuesday, July 31, 2001
Feeling refreshed and recharged after some vacation time . . . it's full steam ahead on all fronts for Genome@home.
While I was away, Jeremy and Amit were working hard on the data analysis for the first two scientific papers to come out of the Genome@home project. We're all very excited to have meaningful scientific results from this project, after only running for 6 months. These papers will be the focus of our efforts through to the end of the summer, so expect more updates on the scientific results page over the next few weeks.
The stats weighting algorithm was modified to more accurately reflect the differences in computing time needed to crunch work units of varying length. A few users have submitted some thorough analyses of processing times for various work units, and I'll try to use this data to fine-tune the weighting algorithm a bit.
I spoke recently with the Folding@home team about their progress with the new, much improved folding client. The graphics look great, but they're still working with Adam at COSM to iron out the last bugs in the HTTP networking, so that all firewall setups will be supported. I was advised to wait a week or two before I begin plugging the Genome@home code into their client layout. In the meantime, I'm focusing on ridding us of the horrid page fault bug in the Fortran code once and for all, so that a clean version of the science code can be included in the HTTP client.
A great side effect of doing the data analysis for the scientific papers is that the results are finally organized in a meaningful, easy-to-use format. I'll write up some CGI's, etc. this week to allow access to our results database from the website. Also, I plan to post a prioritized "to-do" list on the website, so that users can keep up with the progress of the project on a daily/weekly basis.
Wednesday, June 27, 2001
It's been a while since the last update, but we've got lots of news for you today.
First, a hearty welcome to the newest members of the Genome@home team. Amit Garg is a Stanford student, who's currently working on analyzing the utility of our designed sequence libraries for structure prediction and gene annotation in newly sequenced genomes. Jeremy England is a visiting student from Harvard, who's addressing the issue of designability. In just a few days, they have both done an incredible amount of work, and the pace of our scientific progress is really picking up.
Folding@home, our sister project, has begun beta-testing the 2.0 version of their client, and it's looking great so far: nice graphics, HTTP support, and some very convenient internal modifications. Once the bugs are worked out, we'll use a very similar architecture to set up the 1.0 Genome@home client.
The stats hounds out there have been clamouring for a more equitable stats weighting adjustment, so that the points given for a completed work unit more accurately reflects the relative time it takes to process the unit. I'm looking into this today, so I should have this remedied soon.
Speaking of stats, congratulations to Ars Technica Team Primordial Soup for breaking the 1-million unit mark recently. Great work from a bunch of dedicated distributed computing enthusiasts. Thanks and congratulations for all your hard work and support!
Finally, I'll be away on vacation for the next three weeks, so if something horrible goes wrong, it might not get fixed for a while. Things have been running very smoothly with little intervention for the last few weeks, so I suspect that everything is pretty much kosher.
Friday, May 25, 2001
I fixed up the speed bugs in the the 0.98 version of the Windows client, and managed to speed up the Linux version in the process. The new builds can both be downloaded at the links below (May 18). I've received lots of good suggestions for minor changes, which will be incorporated in the next week or so.
Version 1.0, which will include HTTP proxy support, is just around the corner. The COSM libraries now include server and client side API's, and the Folding@home crew is testing things out, with Adam's help, as we speak.
Busy day, gotta run. More on the client and some new science next week.
Friday, May 18, 2001
The 0.98 version of the client is available for testing. Both the Windows and Linux versions are full installs. They can be downloaded here:
A description of the changes in this version is available in the FAQ. As always, comments, suggestions, and bug reports are welcome. They can be posted at the discussion group or emailed to the help desk.
Friday, May 11, 2001
The bad news is that the 0.98 version of the client is not quite ready for release today. The good news is that the coding is done, but I want to do some more testing. Next week I'll put out the release.
I added over two thousand new structural variants to the SH3 project. These are very minor variations on the experimentally-determined three-dimensional structure of the SH3 proteins. Proteins are in reality very dynamic molecules, so instead of using just one snapshot in time (i.e. the experimental structure), we're using this approach to design to a much larger range of the protein's real physical structure. It's very interesting to see how different the designed sequences can be for very small structural variations.
I've also made a small change to the way that seeds are handed out from the server. It seems that the longer seeds (10-digit) were giving some users problems, so the seeds are now 6-digit, which seems to be working well.
Next week, the new client will be released, and we'll get started in earnest on the SH3 ligand-binding project.
Friday, May 3, 2001
Vijay and I have been busy the last couple of weeks mapping out the details of the next few experiments that Genome@home will be running. We've decided to change our technique for generating sequences slightly, taking advantage of the current large flow of data to tackle a couple of fundamental problems in protein design. It's a bit brute force, but it has an elegance in that one simple change will allow us to generate sequence diversity and incorporate structural flexibility, two major stumbling blocks in the field of protein design. I'll likely put together a blurb about this in the scientific results section soon.
I'll be upgrading the SH3 project this weekend by adding a much greater diversity of structures (related to the above). I'll also be launching a new experiment wherein we're addressing the ligand-binding function of the SH3 domain. One of the challenges for protein design is to maintain function of new genes created. Taking it a step further, we hope to be able to introduce new functions into existing protein scaffolds. This project will allow us to test the feasibility of various methods on a well-known system. Another potential target of interest for this type of research is the unimaginatively named protein A. More detailed info will be available soon in the current experiments section.
We've also got some interesting analyses planned for our current data set. One of them involves using the designed sequence libraries to scan microbial genomes for genes of unknown function, with the hope that we can use the Genome@home data to help predict structures and functions of unknown genes.
Coding of the next client version, 0.98, is going well. A number of the simpler features have already been added, and with some hard work next week, things look good for a May 11 release.
Friday, April 27, 2001
It looks like we'll have two bright young students (one studying here at Stanford, and one visiting us from Harvard) joining the project over the summer break. They'll be arriving at a great time, since we've got lots of data ready for analysis, a bunch of exciting new protein design projects "in the pipeline", and both client and server are nearing maturity. With their help, we'll be able to bring Genome@home to full maturity by the end of the summer.
We had some problems this week with users being unable to upload results. It seems that Stanford networking has been doing some work, which was slowing things down intermittently. This should be finished over the weekend. By way of explanation, the client (and server) has a built-in time-out function, which tells it to kill the net send/receive if it's taking too long. The exact time is a function of the total data size, but there's also an absolute upper limit, independent of the data size. Since the download to the client is smaller than the results upload, when the net is slow, it's common that the download finishes under the time limit, but the upload does not. The upper time limit is fairly generous, but we've increased it on the server, and will do the same for the new version of the client. This should help avoid problems during slow network periods.
My poster presentation at the SGF Research Symposium went over well. Look for details here.
Finally, I've made some changes to the website, and will continue to do so over the next few days, to accomodate the fact that the project has grown to a fairly large size in the last couple of months. Look for minor changes in the stats reporting, as well as much-improved accessibility to the compiled Genome@home data.
Friday, April 20, 2001
We've broken many milestones in the last few weeks. We've now had over 10,000 downloads of the G@H client software. We've peaked at over 5,000 genes per day, with a steady flow of several thousand per day for the last few weeks. Finally, we passed the 100,000 gene mark. The lucky user that designed our 100,000th gene is:
Mark your calendars; Genome@home will be appearing on TechTV again, on Thursday, May 3. More on that later.
We've begun more thorough analysis of the data from the first two experiments (beta and SH3). Some of the preliminary analyses from this phase will be presented at a Stanford-internal fellowship symposium next week. The data from these two projects will likely form the core of the first scientific paper from the Genome@home project, which we will begin writing in the near future.
I sat down recently with Vijay, Adam (cosm), Siraj (Folding@home), and Chris (Folding@home) to discuss the layout of the new client versions. G@H 1.0 will be put out in the near future, with built-in HTTP support. Some of you may have noticed that the long-ago-promised new client version has not been forthcoming. I decided to wait until the HTTP libraries were available and will use these to build G@H 1.0. I sincerely hope to have this version out by the end of May. With the data pouring in like it is, it's hard to balance the work between client/server development and science/data analysis. The data has been winning the last few weeks and will likely continue to take precedence for a few more. Future versions (i.e. G@H 2.0) will be similar to F@H 2.0 (not out yet), and I'll be working closely with Siraj on that part of the project.
Thursday, March 15, 2001
Monday, February 26, 2001
Monday, February 19, 2001