Updated May 05, 2009 Additional information, clarifications and corrections have been added inline. After the publication of this blog post, I discussed an unrelated experience with the CrashPlan Technical Support team. They provided the information I've added here; this update concerns changes in product naming, decryption, decompression and a partial explanation for slow restore times.
One particular note about product offering and features: There is a consumer and business product, and within the consumer release, there are different versions. To help compare the consumer version, visit this web page. This article talks about CrashPlan Pro, which is the enterprise solution.
I've deployed CrashPlan Pro Server (CPPS) as a backup solution for a department here at Stanford. They use it for both their desktops and servers. As is best practice, I performed a dry-run restoration of their files kept a network-attached storage. I used the performance metrics to make a back-of-the-envelope guess at how this would compare if we used our direct-attached FireWire 800 device as the backup repository.
CrashPlan Pro Server background
CPP is a terrific backup solution for small businesses that want to hold onto their data on the premises (see the first side note below). It should not be confused with CrashPlan Pro (personal edition), which allows users to back up their computer to magical, hosted servers in the foggy internet (not unlike Mozy Pro).
With the business version, you install the client on your workstations and the CrashPlan Pro Server on just about any type of other machine (a Mac, Windows, Linux, Solaris —they even have a VMware virtual appliance).
You run a client on a workstation, the CrashPlan Pro Server talks to the client and moves the files to your own storage device. They do have a hosted service if you would like your data off-site, just like the personal version. You can do both concurrently, actually.
And aside from impressive performance and a (somewhat) easy-to-use web-based management console, CPP has all the advanced features one expects nowadays, including de-duplification, compression and encryption. Best yet, they give away the CrashPlan Pro Server software; you just pay licenses for the number of clients you're backing up.
Regarding encryption: The CrashPlan Pro Server does not do the encryption and decryption. Instead, it is the client running on the workstation that first encrypts the files to be backed up, sending it across the network as a secure block. During a restore, the process is reversed. The CrashPlan Pro Server sends the encrypted files back across the network to the client, which handles the task of decryption.
Regarding decompression: This is where restoration really benefits from having a workhorse of a client (not, as I emphasize later, of the server). The CrashPlan client will verify the integrity of the incoming block from the CrashPlan Pro Server; it will decrypt and decompress the file; write the file to disk; then verify the file's checksum to further ensure integrity, which might have been compromised by client activity or other workstation hardware issues.
Regarding factors influencing restoration speed: The three primary factors that determine restoration speed would be the disk read speed on the Xserve (or your storage device); the speed of your network (both between your Xserve and it's network datastore; and also from the Xserve to your workstation). Finally, the speed of disk writes on the workstation matter, naturally.
In the test of our backup and restore procedure, I initiated a full restore of all the Dean's Office data that normally lives on a dedicated, internal RAID 1 volume. This is an Xserve (Late 2006) model with two dual-core Xeon processors, 4 GB of RAM and three SATA drives. The machine is primarily their workgroup file server for about thirty to forty people, typically with less than five concurrent AFP and SMB sessions.
Our CrashPlan Pro Server runs on this Xserve. It backs up the department's workstations, as well as the server on which it runs. I wanted to get some (very) rough metrics on how long a restore process would take; in the event of a catastrophe, the Dean's Office would have a tough time waiting it out.
I am looking first at how long it took to restore 451,418 files (184.47 GB) from our current storage repository, the campus Low Cost Central Storage (LCCS) option. This is a cost-recovery service for campus affiliates looking for expandable CIFS storage at 35¢ a gigabyte. The Mac server is connected to the EMC network-attached storage via a static CIFS mount. (That itself was a challenge.)
Restore results from our NAS
Here is the entry from the CPP Server restore log:
04/21/09 11:07PM 356326319858113082 Restore from HSDO CrashPlan Server completed: 445,910 Files restored @ 2812.78 KB/s [22 Mb/s]
This log entry tells us that it took about six hours to restore the files, with an average speed of 22 megabits per second. What this number represents is unclear to me, but the CPP Server program has to decrypt and decompress everything en route to the restore destination. So even though the network is (at least) 1000baseT, there's significantly more overhead than just moving data. (It would stand to reason that having a beefy server to do the heavy mathematical lifting helps to improve things.)
Estimating the same activity from the DAS
Let's now make a back-of-the-envelope estimation of how this would compare if we restored from our other backup archive repository, a directly-attached 1U RAID 0+1 device, configured as one volume dedicated to CPP. It's attached via two FireWire 800 connections (one for each mirror, I suppose).
To see how fast I could talk from the DAS back to the internal hard drives on the server, I averaged the results of moving a 10 GB file three times. This figure was 339 megabits/sec. (You weren't expecting 800 Mbps were you?) Before you can say, "wait, that's one fat contiguous file, without decryption and decompression, of course that will be fast," keep in mind I'll try to account for a real-life scenario by using the performance penalty demonstrated with the original restore from the NAS. While this might not be entirely reliable, it's still informative.
In looking at the 21 Mb transfer rate, we see from comparing the throughput test results below that the approximate penalty in the restoration process is 46% —that is, restoring approximately 450,000 files comprising 180 GB is 54% slower that if we just moved one, 10 GB file. Using this as a baseline, the restoration of the same archive from the FW800 volume could be mathematically estimated as follows:
339 Mb/s * .54 [the restoration penalty] is about 183 Mb/s
183 Mb/s is about 23 MB/s
180 GB restored at 23 MB/s is about 8,013s, or 2:15 hours
[or, to compute this another way]
339Mb/s * .54 is about 183 Mb/s
184.47 GB is 1,511,178 Mb
1,511,178 Mb / 183 Mb/s is 8,258 seconds, or 2:18 hours
Now, this is just some rough math, and there are lots of other variables in play (the network might have some exceptional congestion, the server was simultaneously busy with other tasks, who knows). But it's safe to estimate the restoration time from the directly-attached FireWire 800 storage array would take a little over two hours, versus the six it took to go over the network.
Side note #1
I have no relationship with CrashPlan Pro except that I'm a customer who has paid them money for serial numbers and licenses; plus I've used their technical support, too . They have no association with this blog, my review, or Stanford University.
Side note #2
Three times I used rsync to move one contiguous 10 GB file from the operating system's internal RAID volume to four different destinations. I'm adding them here, should they interest anyone.
Average throughput moving one 10 GB from the server's OS volume to...
- Low Cost Central Storage (CIFS NAS) : 39 Mb/sec
- AFP to another Mac server across campus : 179 Mb/sec
- The FireWire 800 DAS : 339 Mb/sec
- The internal data volume on the same server : 413 Mb/sec
Side note #3
One thing about the campus' LCCS offering: at some point in the not-too-distant future, data from the EMC NAS will be replicated across the Bay to a data center being built in Livermore. It would be interesting to perform a similar restore from 40 miles away (and on another fault line).
Side note #4
Within the CrashPlan Pro web-based interface, there is an option to do a speed test to gauge the throughput between the CrashPlan Pro Server and your chosen archive repository. After submitting a help ticket with CrashPlan Pro's (pretty great) technical support, I was informed these results are unreliable and "not intended for large servers." This speed test does a one-time transfer of a single ~60 MB file. It presents a red herring result: a dismal 3-8 Mb/sec transfer rate to the NAS.
Really important side note #5
Currently, CrashPlan Pro (the enterprise backup solution I've been discussing in this article) does not respect access control lists (ACLs). It's important to review the list of supported metadata provided in the CrashPlan Support Wiki.
Support for ACLs is expected in the next release (M22) which has no announced released date at the time of this posting.