To RAID or not to RAID, that is the question

People often ask: Should I raid my disks?

 

The question is simple, unfortunately the answer is not. So here I’m going to give you another guide to help you decide when a raid array is advantageous and how to go about it. Notice that this guide also applies to SSD’s, with the expection of the parts about mechanical failure.

 

What is a RAID?

 

RAID is the acronym for “Redundant Array of Inexpensive Disks”. The concept originated at the University of Berkely in 1987 and was intended to create large storage capacity with smaller disks without the need for very expensive and reliable disks, that were very expensive at that time, often a tenfold of smaller disks. Today prices of hard disks have fallen so much that it often is more attractive to buy a single 1 TB disk than two 500 GB disks. That is the reason that today RAID is often described as “Redundant Array of Independent Disks”.

 

The idea behind RAID is to have a number of disks co-operate in such a way that it looks like one big disk. Note that ‘Spanning’ is not in any way comparable to RAID, it is just a way, like inverse partitioning, to extend the base partition to use multiple disks, without changing the method of reading and writing to that extended partition.

 

Why use a RAID?

 

Now with these lower disks prices today, why would a video editor consider a raid array? There are two reasons:

 

 

1. Redundancy (or security)

 

 

2. Performance

 

 

Notice that it can be a combination of both reasons, it is not an ‘either/or’ reason.

 

 

Does a video editor need RAID?

 

No, if the above two reasons, redundancy and performance are not relevant. Yes if either or both reasons are relevant.

 

Re 1. Redundancy

Every mechanical disk will eventually fail, sometimes on the first day of use, sometimes only after several years of usage. When that happens, all data on that disk are lost and the only solution is to get a new disk and recreate the data from a backup (if you have one) or through tedious and time-consuming work. If that does not bother you and you can spare the time to recreate the data that were lost, then redundancy is not an issue for you. Keep in mind that disk failures often occur at inconvenient moments, on a weekend when the shops are closed and you can’t get a replacement disk, or when you have a tight deadline.

 

 

Re 2. Performance

Opponents of RAID will often say that any modern disk is fast enough for video editing and they are right, but only to a certain extent. As fill rates of disks go up, performance goes down, sometimes by 50%. As the number of disk activities on the disk go up , like accessing (reading or writing) pagefile, media cache, previews, media, project file, output file, performance goes down the drain. The more tracks you have in your project, the more strain is put on your disk. 10 tracks require 10 times the bandwidth of a single track. The more applications you have open, the more your pagefile is used. This is especially apparent on systems with limited memory.

 

The following chart shows how fill rates on a single disk will impact performance:

 

HD Tach B.jpg

Remember that I said previously the idea behind RAID is to have a number of disks co-operate in such a way that it looks like one big disk. That means a RAID will not fill up as fast as a single disk and not experience the same performance degradation.

 

 

RAID basics

 

Now that we have established the reasons why people may consider RAID, let’s have a look at some of the basics.

 

Single or Multiple?

 

There are three methods to configure a RAID array: mirroring, striping and parity check. These are called levels and levels are subdivided in single or multiple levels, depending on the method used. A single level RAID0 is striping only and a multiple level RAID15 is a combination of mirroring (1) and parity check (5). Multiple levels are designated by combining two single levels, like a multiple RAID10, which is a combination of single level RAID0 with a single level RAID1.

 

Hardware or Software?

 

The difference is quite simple: hardware RAID controllers have their own processor and usually their own cache. Software RAID controllers use the CPU and the RAM on the motherboard. Hardware controllers are faster but also more expensive. For RAID levels without parity check like Raid0, Raid1 and Raid10 software controllers are quite good with a fast PC.

 

The common Promise and Highpoint cards are all software controllers that (mis)use the CPU and RAM memory. Real hardware RAID controllers all use their own IOP (I/O Processor) and cache (ever wondered why these hardware controllers are expensive?).

 

There are two kinds of software RAID’s. One is controlled by the BIOS/drivers (like Promise/Highpoint) and the other is solely OS dependent. The first kind can be booted from, the second one can only be accessed after the OS has started. In performance terms they do not differ significantly.

 

For the technically inclined: Cluster size, Block size and Chunk size

 

In short: Cluster size applies to the partition and Block or Stripe size applies to the array.

 

With a cluster size of 4 KB, data are distributed across the partition in 4 KB parts. Suppose you have a 10 KB file, three full clusters will be occupied: 4 KB – 4 KB – 2 KB. The remaining 2 KB is called slackspace and can not be used by other files. With a block size (stripe) of 64 KB, data are distributed across the array disks in 64 KB parts. Suppose you have a 200 KB file, the first part of 64 KB is located on disk A, the second 64 KB is located on disk B, the third 64 KB is located on disk C and the remaining 8 KB on disk D. Here there is no slackspace, because the block size is subdivided into clusters. When working with audio/video material a large block size is faster than smaller block size. Working with smaller files a smaller block size is preferred.

 

Sometimes you have an option to set ‘Chunk size’, depending on the controller. It is the minimal size of a data request from the controller to a disk in the array and only useful when striping is used. Suppose you have a block size of 16 KB and you want to read a 1 MB file. The controller needs to read 64 times a block of 16 KB. With a chunk size of 32 KB the first two blocks will be read from the first disk, the next two blocks from the next disk, and so on. If the chunk size is 128 KB. the first 8 blocks will be read from the first disk, the next 8 block from the second disk, etcetera. Smaller chunks are advisable with smaller filer, larger chunks are better for larger (audio/video) files.

 

 

RAID Levels

 

For a full explanation of various RAID levels, look here: http://www.acnc.com/04_01_00/html

 

What are the benefits of each RAID level for video editing and what are the risks and benefits of each level to help you achieve better redundancy and/or better performance? I will try to summarize them below.

 

RAID0

 

The Band AID of RAID. There is no redundancy! There is a risk of losing all data that is a multiplier of the number of disks in the array. A 2 disk array carries twice the risk over a single disk, a X disk array carries X times the risk of losing it all.

 

A RAID0 is perfectly OK for data that you will not worry about if you lose them. Like pagefile, media cache, previews or rendered files. It may be a hassle if you have media files on it, because it requires recapturing, but not the end-of-the-world. It will be disastrous for project files.

 

Performance wise a RAID0 is almost X times as fast as a single disk, X being the number of disks in the array.

 

RAID1

 

The RAID level for the paranoid. It gives no performance gain whatsoever. It gives you redundancy, at the cost of a disk. If you are meticulous about backups and make them all the time, RAID1 may be a better solution, because you can never forget to make a backup, you can restore instantly. Remember backups require a disk as well. This RAID1 level can only be advised for the C drive IMO if you do not have any trust in the reliability of modern-day disks. It is of no use for video editing.

 

RAID3

 

The RAID level for video editors. There is redundancy! There is only a small performance hit when rebuilding an array after a disk failure due to the dedicated parity disk. There is quite a perfomance gain achieveable, but the drawback is that it requires a hardware controller from Areca. You could do worse, but apart from it being the Rolls-Royce amongst the hardware controllers, it is expensive like the car.

 

Performance wise it will achieve around 85% (X-1) on reads and 60% (X-1) on writes over a single disk with X being the number of disks in the array. So with a 6 disk array in RAID3, you get around 0.85x (6-1) = 425% the performance of a single disk on reads and 300% on writes.

 

 

RAID5 & RAID6

 

The RAID level for non-video applications with distributed parity. This makes for a somewhat severe hit in performance in case of a disk failure. The double parity in RAID6 makes it ideal for NAS applications.

 

The performance gain is slightly lower than with a RAID3. RAID6 requires a dedicated hardware controller, RAID5 can be run on a software controller but the CPU overhead negates to a large extent the performance gain.

 

 

RAID10

 

The RAID level for paranoids in a hurry. It delivers the same redundancy as RAID 1, but since it is a multilevel RAID, combined with a RAID0, delivers twice the performance of a single disk at four times the cost, apart from the controller. The main advantage is that you can have two disk failures at the same time without losing data, but what are the chances of that happening?

 

 

RAID30, 50 & 60

 

Just striped arrays of RAID 3, 5 or 6 which doubles the speed while keeping redundancy at the same level.

 

 

EXTRAS

 

RAID level 0 is striping, RAID level 1 is mirroring and RAID levels 3, 5 & 6 are parity check methods. For parity check methods, dedicated controllers offer the possibility of defining a hot-spare disk. A hot-spare disk is an extra disk that does not belong to the array, but is instantly available to take over from a failed disk in the array. Suppose you have a 6 disk RAID3 array with a single hot-spare disk and assume one disk fails. What happens? The data on the failed disk can be reconstructed in the background, while you keep working with negligeable impact on performance, to the hot-spare. In mere minutes your system is back at the performance level you were before the disk failure. Sometime later you take out the failed drive, replace it for a new drive and define that as the new hot-spare.

 

As stated earlier, dedicated hardware controllers use their own IOP and their own cache instead of using the memory on the mobo. The larger the cache on the controller, the better the performance, but the main benefits of cache memory are when handling random R+W activities. For sequential activities, like with video editing it does not pay to use more than 2 GB of cache maximum.

 

 

REDUNDANCY

(or security)

 

 

Not using RAID entails the risk of a drive failing and losing all data. The same applies to using RAID0 (or better said AID0), only multiplied by the number of disks in the array.

 

RAID1 or 10 overcomes that risk by offering a mirror, an instant backup in case of failure at high cost.

 

RAID3, 5 or 6 offers protection for disk failure by reconstructing the lost data in the background (1 disk for RAID3 & 5, 2 disks for RAID6) while continuing your work. This is even enhanced by the use of hot-spares (a double assurance).

 

 

PERFORMANCE

 

RAID0 offers the best performance increase over a single disk, followed by RAID3, then RAID5 amd finally RAID6. RAID1 does not offer any performance increase.

 

Hardware RAID controllers offer the best performance and the best options (like adjustable block/stripe size and hot-spares), but they are costly.

 

 

SUMMARY

 

If you only have 3 or 4 disks in total, forget about RAID. Set them up as individual disks, or the better alternative, get more disks for better redundancy and better performance. What does it cost today to buy an extra disk when compared to the downtime you have when a single disk fails?

 

If you have room for at least 4 or more disks, apart from the OS disk, consider a RAID3 if you have an Areca controller, otherwise consider a RAID5.

 

If you have even more disks, consider a multilevel array by striping a parity check array to form a RAID30, 50 or 60.

 

If you can afford the investment get an Areca controller with battery backup module (BBM) and 2 GB of cache. Avoid as much as possible the use of software raids, especially under Windows if you can.

 

RAID, if properly configured will give you added redundancy (or security) to protect you from disk failure while you can continue working and will give you increased performance.

 

Look carefully at this chart to see what a properly configured RAID can do to performance and compare it to the earlier single disk chart to see the performance difference, while taking into consideration that you can have one disks (in each array) fail at the same time without data loss:

Areca_HDTach1.jpg

 

Hope this helps in deciding whether RAID is worthwhile for you.

 

WARNING: If you have a power outage without a UPS, all bets are off.

 

A power outage can destroy the contents of all your disks if you don’t have a proper UPS. A BBM may not be sufficient to help in that case.

via Adobe Community : Popular Discussions – Adobe Premiere Forums http://forums.adobe.com/thread/525263