Folding@home on ATI's GPUs: a major step forward

Please note that the GPU1 core described in this FAQ below has now been retired and a next generation GPU2 core is available for both ATI and NVIDIA cards. Please see these links for more details: FAQ-NVIDIA for NVIDIA and FAQ-ATI2 for ATI.

Table of Contents

- Introduction
- A Brief History of FAH: From Tinker to Gromacs to GPUs
- Folding@home debuts with the Tinker core (October 2000)
- A major step forward: the Gromacs core (May 2003)
- The next major step forward: Streaming Processor cores (September 2006)
- The second generation GPU core, aka GPU2 (April 2008)
- Retirement of the GPU1 core (June 2008)
- How to Run the Windows FAH GPU Beta Client
- Known Bugs, etc.
- Known Bugs
- Known compatibility issues
- Notes for running
- Troubleshooting EUE's
- Policy Notes
- Known bugs, etc., from previous software versions (fixed in current version)
- GPU and OS Support
- Which cards are supported?
- What about video cards with other (non-ATI) chipsets?
- Is the GPU client for Windows XP only? Has it been tested on other OSs like Linux, Mac, and Vista/Win7?
- Are there any plans to enable the client to take advantage of multiple GPUs?
- Running the new client
- How long do you estimate this program will remain an open beta before it turns into a final client?
- What scientific cores does the FAH GPU client support? Only Gromacs cores? Others cores like Amber?
- Does the FAH GPU client run the same WU�s as the regular FAH client?
- How will points be awarded?
- What impacts will this to have on work units and Folding Team scores?
- How big will the download/upload files be?
- Can we expect modem users to have new problems with the size of a GPU WU?
- Will the Collection Servers accept uploads for these projects or will their size be a problem?
- Are these WUs compatible with other fahcores?
- How to run multiple GPUs (GPU client version 5.91 or later only)

Introduction

Since 2000, Folding@home (FAH) has lead to a major jump in the capabilities of molecular simulation. By joining together hundreds of thousands of PCs throughout the world, calculations, which were previously considered impossible, have now become routine. FAH has targeted the study of protein folding and protein folding diseases, and numerous scientific advances have come from the project.

Now in 2006, we began looking forward to another major advance in capabilities. This advance utilizes the new, high performance Graphics Processing Units (GPUs) from ATI to achieve performance previously only possible on supercomputers. With this new technology, as well as the new Cell processor in Sony�s PlayStation 3, we will soon be able to attain performance on the 100 gigaflop scale per computer. With this new software and hardware, we will be able to push Folding@home a major step forward.

Our goal is to apply this new technology to dramatically advance the capabilities of Folding@home, applying our simulations to further study of protein folding and related diseases, including Alzheimer�s Disease, Huntington's Disease, and certain forms of cancer. With these computational advances, coupled with new simulation methodologies to harness the new techniques, we will be able to address questions previously considered impossible to tackle computationally, and make even greater impacts on our knowledge of folding and folding related diseases.

A Brief History of FAH: From Tinker to Gromacs to GPUs

Folding@home debuts with the Tinker core (October 2000)

In October 2000, Folding@home was officially released. The main software core engine was the Tinker molecular dynamics (MD) code. Tinker was chosen as the first scientific core due to its versatility and well laid out software design. In particular, Tinker was the only code to support a wide variety of MD force fields and solvent models. With the Tinker core, we were able to make several advances, including the first folding of a small protein starting purely from sequence (subsequently published in Nature).

A major step forward: the Gromacs core (May 2003)

After many months of testing, Folding@home officially rolled out a new core based on the Gromacs MD code in May 2003. Gromacs is the fastest MD code available, and likely one of the most optimized scientific codes in the world. By using hand tuned assembly code and utilizing new hardware in many PC�s and Intel-based Mac�s (the SSE instructions), Gromacs was considerably faster than most MD codes by a factor of about 10x, and approximately a 20x to 30x speed increase over Tinker (which was written for flexibility and functionality, but not for speed).

However, while Gromacs is faster than Tinker, it has limits to what it can do; for example, it does not support many implicit solvent models, which play a key role in our folding simulations with Tinker. Thus, while Gromacs significantly sped certain calculations, it was not a replacement for Tinker, and so the Tinker core will continue to play an important role in Folding@home (including a recent paper in Science). For these reasons, points for Gromacs WU�s were set to be consistent with points for Tinker WU�s, as both play an important role in the science of FAH. Moreover, we switched the benchmark machine to a 2.8 GHz Pentium 4 (from a 500MHz Celeron) in order to allow us to fairly benchmark these types of WU�s (as the benchmark machine needed to have hardware support for SSE).

The next major step forward: Streaming Processor cores (September 2006)

Much like the Gromacs core greatly enhanced Folding@home by a 20x to 30x speed increase via a new utilization of hardware (SSE) in PC�s, in 2006, Folding@home has developed a new streaming processor core to utilize another new generation of hardware: GPU�s with programmable floating-point capability. By writing highly optimized, hand tuned code to run on ATI X1900 class GPU�s, the science of Folding@home will see another 20x to 30x speed increase over its previous software (Gromacs) for certain applications. This great speed increase is achieved by running essentially the complete molecular dynamics calculation on the GPU; while this is a challenging software development task, it appears to be the way to achieve the highest speed improvement on GPU's.

In addition, through collaboration with Pande Group, Sony has developed an analogous core for the PS3�s Cell processor (another streaming processor), which should see a significant speed increase for the science over the types of calculations we could previously do on a x86/SSE Gromacs core as well. Following what we did with the introduction of Gromacs, we will now switch benchmark machines and include an ATI X1900XT GPU in order to be able to benchmark streaming WUs (which cannot be run on non-GPU machines). This machine will also benchmark CPU units (which continue to be of value since GPUs work only for certain simulations) without using its GPU.

The second generation GPU core, aka GPU2 (April 2008)

After running the original GPU core for quite some time and analyzing its results, we have learned a lot about running GPGPU software. For example, it has become clear that a GPGPU approach via DirectX (DX) is not sufficiently reliable for what we need to do. Also, we've learned a great deal about GPU algorithms and improvements. One of the really exciting aspects about GPU's is that not only can they accelerate existing algorithms significantly, they get really interesting in that they can open doors to new algorithms that we would never think to do on CPUs at all (due to their very slow speed on CPUs, not but GPU's).

After much efforts, we have taken all we've learned about GPUs from the first generation client and produced a second generation client. This new client appears to be faster, more reliable, and has more scientific functionality. The preliminary results so far from it look very exciting, and we're excited to now open up the client for FAH donors to run.

You can find more about the GPU2 client on its FAQ page.

Retirement of the GPU1 core (June 2008)

After two years, we have retired the GPU1 core on June 6, 2008. We have learned a lot from the GPU1 core and those lessons have gone into creating the GPU2 core. You can find more about the GPU2 client on its FAQ page.

How to Run the Windows FAH GPU Beta Client

This is a beta release and we expect that there will be several bugs, flaws, problems, etc -- releasing software for GPU's is itself very new in the software industry in general and we expect that there will be problems at the start. To minimize the problems, we have been testing the client and cores extensively in house and they run well there. However, it's our experience that running in the controlled setup in our lab and running "out in the wild" are very different situations. In Alpha testing involving non-Stanford clients, we have seen a lot of EUE's. We are working to track this down, but we need more data to learn about why this is happening, hence the need for an open beta.

As in the use of any beta software, please make sure to back up your hard drive, and do not run this client on any machine which cannot tolerate even the slightest instability or problems.

There are two steps:

  1. Download the client file from the Folding@home web site download page. For the console client, just run it as normal. For the GUI client, unzip the file to its own folder and then run the winFAH binary. Further instructions can be found on our console and GUI client pages. We do not recommend this beta client for those who have never run FAH before, as in this beta test, several parts of the client will not be working correctly and/or will require a knowledge of how FAH works.
  2. Download and install the necessary system software. Due to the complex nature of performing scientific calculations on GPUs, the FAH GPU client needs very specific system software to work. We think that this is a big nuisance, and we are working on a way to avoid this in the future. But for now there is no way around this (please keep in mind that a Graphics Driver is really a compiler of sorts and thus any GPU code is very sensitive to this issue). Before installing the new system software, don't forget to backup your computer's hard drive (always a good idea in these situations) and then install the following components:
  • Catalyst driver version 6.5 was the original working driver version. Version 6.10 and 6.11 with the newest core works well. Avoid versions 6.6 - 6.9, 6.12, and 7.1, that either run slowly or not at all. Due to all the complexities of support, we will support known good driver versions. Versions 7.2 - 7.11 are also known good drivers, and can be downloaded from ATI.
  • DirectX: 9.0c (4.09.0000.0904) or later, which yields d3dx9_30.dll (the critical part for FAH). Download from here.

This system software configuration is critical, as the wrong settings can lead to problems, such as excessive Early Unit Ends (EUE's). EUE's may also occur frequently in the current GPU cores due to our testing of new types of WU's. We need the help of beta testers to help pin down this issue with more data.

The current GPU clients expire on February 1, 2008. Once we go final (i.e. out of a public beta), we will remove the expiration date code from the client. The expiration dates are useful to make sure donors upgrade clients, as older beta SMP or GPU clients often have bugs that need to be upgraded.

Known Bugs, etc.

Please note the Windows GPU client for Folding@home is a beta release. While we have done lots of testing in house, there are limits to the bugs we can find in these limited tests (and hence the need for a beta test).

Thus, we expect there will be many problems with the client that need to be resolved. Below is a list of some of the relevant known issues or bugs for beta testers of this new client.

Known Bugs

  1. The GUI pause core command doesn't work
  2. The GUI Client will automatically quit when running full screen DirectX apps (games), it may be a little slow at doing this though

Known compatibility issues

  1. One must use specific Catalyst drivers (listed above) and a recent DX version-- see the HOW TO above.
  2. The GPU core does not run under WINE in Linux (nor do we have plans to support WINE).
  3. The service install option of previous console clients does not work like previous clients. The main issue is the client must be run under the user account that owns the display (context) in order for us to access the GPU for calculations. This is not a bug, but a fundamental limitation of Windows and DirectX when used for GPU computation. Novec has suggested this work around: Running the GPU client, as a service is possible, you just have to give it desktop privileges. After installing the client as a service, open the service in computer manager, go to the Log On tab, keep it at "Local System account" and check "Allow service to interact with desktop". Then it crunches along just as usual. You probably shouldn't set it to start up automatically, though, since whichever program you use to increase clock speeds will load after the client service. Although I haven't had any client crashes or EUEs when changing clock speeds on the fly, it's generally not recommended.
  4. Several Windows features cause the GPU context to be lost from the client. When this happens the client closes and must be restarted manually. The following operations are known to cause this behavior: Pressing CTRL+ALT+DEL to access the Windows Security Dialog (Win XP Pro and Win 2000) ; Pressing WIN+L to either switch user (Win XP Pro/Home) or to Lock Windows (Win XP Pro) ; Initiating a Remote Desktop Session to the local machine. If you want to access task manager without loosing GPU context, you can press CTRL+SHIFT+ESC, or right click on the taskbar and select Task Manager. Again, this is not a bug, but a fundamental limitation of Windows when used for GPU computation.
  5. Screen saver mode in the GUI does not work. This is an issue with DirectX and likely cannot be easily resolved.

Notes for running

  1. The GPU GUI client will slow down the scientific core somewhat (since both use the GPU heavily). We do not recommend the GUI client for long term use, unless you have two graphics cards.
  2. Multiple GPUs are currently supported (see instructions below), but Crossfire is not supported -- Crossfire will make FAH run more slowly than using a single GPU (this isn't a bug as much as an issue regarding the nature of a Crossfire/SLI type architecture).
  3. The GPU client is not meant to be run in the background while one uses the computer for applications with heavy GPU usage -- it will greatly slow down the response of programs which make heavy use of a GPU, i.e. video watching or editing, etc.
  4. Do not run multiple GPU clients simultaneously on a single GPU board -- there will be a huge (non-linear) slow down in performance.
  5. Client appears to be using lots of CPU time: Graphics drivers must poll the GPU to update the next screen of data. This will look like a lot of CPU time being used, but nothing is really being done. As such, we do not recommend running multiple FAH clients, as this can significantly slow down the GPU client. We recommend dedicating one CPU core to keep the GPU fed with data for best performance.
  6. Do not adjust GPU clocks (eg with ATI Tool) while Folding@home is running. This will reset our code and generally cause problems. This is not a bug in FAH as much as the way that these boards work.
  7. In some rare situations, one needs to set the Hardware acceleration to maximum if the Catalyst install didn't do this by default. To do this, go into the Display Properties/Settings/Advanced/Troubleshoot tab and check to be sure that the Hardware acceleration was set to "Max".

Troubleshooting EUE's

  1. If you are seeing lots of EUE's, please download the ATI Tool application and check that your core voltage is set to 1.4 volts. Under voltage has been shown to lead to problems in certain cases (although we have seen it work without this set to 1.4v as well).
  2. Be careful about overheating and/or overclocking the GPU. If you need further help, please see the Folding Support Forum

Policy Notes

  1. The client will stop working after 3 months (this is a limited release beta -- new clients will be available before the current version ends its test period)
  2. Deadlines will be set to be much shorter than normal, as we need to get data back quickly in this beta test and we are releasing to a very specific set of hardware. This will change in time, as we move from a beta test and as we move towards supporting more graphics cards.

Known bugs, etc., from previous software versions (fixed in current version)

  1. Some testers have found an excessive number of Early Unit End's (EUE's) -- we have not reproduced that at Stanford or ATI and we need help from beta testers to help track this down. If you see many EUE's (i.e. more than 20%), please make a post in the GPU section of the forum with your system configuration (ATI driver version, DX version, OS) and hardware (card, CPU, motherboard type). With the help of beta testers, we hope to nail down what's going on. (fixed in Core v6)
  2. There have been some reports that the GPU core does not save (checkpoint) correctly when quit (fixed in Core v7)
  3. Some people have posted logs showing GPU work going to "Completed 85", then sending results. It appears that this is a reporting issue and that the core is actually working during this time (there is a two-hour lapse between 85% and the send, which is consistent with computing the last 15 % -- about 8 min. per frame, and the returned log and xtc files seem to be just fine). (fixed in Core v7)

GPU and OS Support

Which cards are supported?

Please note the GPU1 core described in this FAQ has been retired and a next generation GPU2 core is available for both ATI and NVIDIA cards. Please see these links for more details: FAQ-NVIDIA for NVIDIA and FAQ-ATI2 for ATI. The 1xxx series of ATI GPUs is no longer supported.

We support several classes of GPU boards, including X1600, X1800, and X1900 series GPUs from ATI. At the launch, we supported X1900 series cards only. X1800 cards do not provide the performance seen in X1900's and so we strongly recommend X1900 class cards. X1900 and X1800 cards are actually quite different -- they have different processors (R520, R530 vs. the R580). The R580 in the X1900 makes a huge difference in performance -- its 48 pixel shaders are key, as we use pixel shaders for our computations. Also note that the card should have at least 512MB of RAM, otherwise the GPU client will put a huge load on the client machine (although we do note that the 256MB X1950Pro using PCIe does work reasonably well on current projects).

What about video cards with other (non-ATI) chipsets?

The R580 (in the X1900XT, etc.) performs particularly well for molecular dynamics, due to its 48 pixel shaders. Currently, other cards (such as those from nVidia and other ATI cards) do not perform well enough for our calculations as they have fewer pixel shaders. Also, nVidia cards in general have some technical limitations beyond the number of pixel shaders that makes them perform poorly in our calculations.

Is the GPU client for Windows XP only? Has it been tested on other OSs like Linux, Mac, and Vista/Win7?

We will launch with Windows XP (32 bit only) support due to driver and compiler support issues. In time, we hope to support Linux as well. Macintosh OSX support is much further out, as the compilers and drivers we need are not supported in OSX, and thus we cannot port our code until that has been resolved.

Users have reported the GPU client works in Vista/Win7, but due to a different DX version, the performance characteristics vary slightly from Windows XP/2003.

Are there any plans to enable the client to take advantage of multiple GPUs?

We will not support this at launch, but we are aggressively working to support multi-GPU systems. See the multiple GPU section below.

Running the new client

How long do you estimate this program will remain an open beta before it turns into a final client?

This is hard to predict, as it depends on how well the code works "in the wild." Also, using GPUs for calculations is pioneering new territory, so there may be unexpected consequences that nobody could foresee.

What scientific cores does the FAH GPU client support? Only Gromacs cores? Others cores like Amber?

We will support a special core for streaming processors only (FahCore_10); this core has elements of Gromacs (mainly for the "bookkeeping", but has a completely rewritten set of inner loop core -- the part which does all the work). Other core support (Amber or Tinker) is not planned, but is in principle possible, if the science requires it.

Does the FAH GPU client run the same WU�s as the regular FAH client?

No, the GPU will run a set of WUs specially constructed for the FahCore�s new functionality. While the FahCore_10 WUs use the same file format as Gromacs WUs, the scientific code that performs the calculation is different and the WUs for FahCore_10 will yield incorrect results if run with Gromacs (and vice versa).

How will points be awarded?

What impacts will this to have on work units and Folding Team scores?

We will continue to award points in the same method as we�ve always used in Folding@home. To award points for a WU, the WU is run on a benchmark machine. The points are currently awarded as 110 points/day as timed on the benchmark machine. We will continue with this method of calibrating points by adding an ATI X1900XT GPU to the new benchmark machine (otherwise, without a GPU, we could not benchmark GPU WU's on the benchmark machine!). Since FahCore_10 GPU WUs cannot be processed on the CPU alone, we must assign a new set of points for GPU WUs, and we are setting that to 440 points per day (PPD) to reflect the added resources that GPU donors give to FAH. In cases where we need to use CPU time in addition to the GPU (as in the current GPU port), we will give extra points to compensate donors for the additional resources used. Right now, GPU WUs are set to 660 PPD. As we go through the beta process, we will examine the issue of points for WUs. We do understand the significance of this in compensating donor contributions. We do not expect this will have significant affect on Team scoring.

How big will the download/upload files be?

Can we expect modem users to have new problems with the size of a GPU WU?

Will the Collection Servers accept uploads for these projects or will their size be a problem?

The WUs will be small to start (<1MB download, a couple of MB upload). The download is small as implicit solvent calculations only need the protein atoms to start (no explicit water molecules); the upload will be bigger as we need to bring back more data than in normal FAH WUs. As we add more functionality to the GPU code, the WUs will get bigger. As the downloads and uploads won�t be larger than current FAH WUs, we won�t have problems with the Collection Server.

Are these WUs compatible with other fahcores?

The GPU Gromacs core isn't a true port of Gromacs, but rather we've taken key elements from Gromacs we need and enhanced them based on the unique capabilities of GPUs. Thus, it's really a new and different core. GPU WUs cannot correctly run on non-GPU cores and vice versa. This enables some new and exciting science, but it's important to be clear that this isn't just a port.

How to run multiple GPUs (GPU client version 5.91 or later only)

1. Make sure you have two supported GPUs 2. Make sure you have installed supported versions of the Catalyst Drivers 3. Enable the primary output from each card, using the "Settings" tab on "Display Properties"

To do this, you will need to temporarily boot your machine with two monitors attached. If you have a dual input monitor, simply attach one input to card1, and one input to card2. This is necessary so that Windows can detect 2 outputs, thus allowing you to activate the second display. Once this has happened you can safely remove the second monitor (if desired) and Windows will continue to think it is attached. To enable the second display, click on the newly attached monitor, and check the "Extend my Windows desktop onto this monitor". This will turn on the graphical display for the primary output of the Second GPU.

Note: This is not to be confused with the "Secondary Display" of these cards. These can be active (i.e. you could have 4 monitors attached) but you need to know what display number Windows has assigned to each output.

In the examples below, Monitor 1 is connected to the primary output of the 1st "X1900 Series", and Monitor 2 is connected to the 2nd "X1900 Series". If you only have 2 displays enabled, your output should look similar to that shown below.

The following screenshot shows which monitors are attached to which outputs:

When the list says "Default Monitor" it is the equivalent of saying that no monitor is attached, and that the output is disabled. You shouldn't be able to "Extend my Windows desktop onto this monitor". (Even though the checkbox is not disabled)

4. Now you need to get the GPU ID in order to tell the GPU client which GPU to send the cores to.

The GPUs are numbered starting from zero so the GPU ID number is always one less than the screen number. In the example above, where monitor 1 is attached to the primary output of card 1: GPUID=0, and monitor 2 is connected to the primary output of card 2: GPUID=1

If you have all 4 outputs active, and your Monitors are arranged such that the Primary output of card 2 is connected to monitor 3, then it's GPUID=2

5. These GPU IDs are then passed to separate clients as part of a flag. -gpu # [GPU ID]

Using the above example; to run the GPU client/core on card1 start the GPU client with -gpu 0 as an additional flag. To run the client/core on card2, run a client with -gpu 1 as an additional flag. These can be used in conjunction with other standard flags (-local, -verbosity 9 etc.)

6. If you attempt to run the client with an invalid GPU ID, i.e. a display is not attached to the relevant card output OR you specified an ID that corresponds to the "Secondary Display" of a GPU that already has a GPU client running, the client should exit gracefully with an mdrun 99 error stating that the GPU could not be initialized.

Thanks to the Folding@home Moderator Uncle_Fungus for his help compiling these instructions for how to run multiple GPUs.

For More Information, Please See:


Last Updated on June 11, 2012, at 11:57 AM