Need help?

Our experts have had an average response time of 13.52 minutes in October 2021 to fix urgent issues.

We will keep your servers stable, secure, and fast at all times for one fixed price.

RAID resync – Best practices

by | Feb 27, 2019

RAID aka Redundant Array of Independent Disks provide fault tolerance to your servers.

But, what if there are errors within your RAID array?

Unfortunately, that can result in loss of data. RAID resync helps to keep the disk data in sync.

At Bobcares, we regularly monitor the RAID resync process in servers as part of our Server Monitoring Services.

Today, we’ll see the details of RAID resync process and how our Support Engineers actively monitor it to avoid potential hard disk failures.

 

Understanding RAID resync

Firstly, let’s get an understanding of the RAID resync process.

In production servers, the process of adding a device to the RAID array can happen at any time. The success rate of RAID depends largely on the data sync among the disks. But, on adding new disks, the data will not be synchronized with the other devices. That’s where RAID re-syncing helps.

In the re-syncing process, the kernel starts a scan on the original devices and writes the correct blocks to the new device. Usually, the resync is set up as a cron job that run at regular intervals. For example, in Debian, it is based on a Linux utility called mdadm, that manages and monitor software RAID devices.

Similarly, in CentOS systems, it make use of the binary /usr/sbin/raid-check.

 

Best practices in RAID resync

Server peforms a resync for its software raid in defined intervals. Usually, this results in massive load and it may start affecting all services until the resync is complete. Unfortunately, the disk resync process can be lengthy and take up several hours depending on the size of the disk.

Now, let’s see the best practices that our Support Engineers follow to make the RAID resync process faster.

 

1. Resource allocation limits

Normally, the server kernel will automatically prioritize the RAID resync to avoid impact on the server performance. But, in our experience in managing servers, we often see a degraded server performance as the resync progresses.

To overcome this scenario, our Support Engineers limit the bandwidth allocated to the resync process. For this, we add the minimum and maximum cut off limit values in /proc/sys/dev/raid/speed_limit_min and /proc/sys/dev/raid/speed_limit_max.

For example, to restrict the maximum speed of RAID reconstruction to 5 Mb/s, we set the value as

echo 5000 > /proc/sys/dev/raid/speed_limit_max

Similarly, we’ve seen cases where we need to put off the resync processes for a later time, when the websites are having its peak hours. Here, to stop the RAID check and prevent it from restarting, we set the following entry..

echo frozen > /sys/block/md0/md/sync_action

This will stop the check, but still leave the array in a partially checked state. Again, the next time a check starts, it will start from where it left off. Thus, it can really help with managing server resources.

 

2. Using read_ahead

Again, from our experience in managing RAID, we see that setting read_ahead per raid device also helps to make resync faster. During any disk read operation, the read-ahead policy determines when the controller will read additional data records into cache.

In an application that reads data sequentially, read_ahead can improve the performance as such. For example, to set read-ahead to 32 MiB, we use the command:

blockdev --setra 65536 /dev/md0

 

3. Set stripe-cache_size

Similarly, increasing the stripe_cache_size show better results in some types of RAID like RAID5 and RAID6. Stripe_cache_size plays an important role in synchronising all write operations to the array and all read operations if the array is degraded.

However, using high values can cause ‘Out of memory’ error on the server. Therefore, our Support Engineers set the values as per the resource availability on the server. To set stripe_cache_size to 16 MiB for /dev/md3, we use:

echo 16384 > /sys/block/md3/md/stripe_cache_size

 

4. Disable NCQ

Yet another method to reduce the resync time in RAID is to disable Native Command Queuing (NCQ).

NCQ allows the individual hard disk to internally optimize the order in which received read and write commands are executed. But, it can even slow down the resync process. Therefore, we disable it for all the drives in the array.

 

5. Regular monitoring

Again, regularly monitoring of RAID resync always helps. When you suddenly see RAID-resyncing for no apparent reason, it can be a warning signal about something going out of place. It can be a bad disk, or even a RAID failure.

That’s why, we always keep a check on the RAID resync process. In the managed servers, our Support Engineers setup monitoring tools like Nagios that constantly monitor the RAID status from the file /proc/mdstat.

[Need advice on setting up RAID, our Support Engineers can help you.]

 

Conclusion

In a nut-shell, RAID resync helps devices to catch-up with the RAID array and get data back on sync. Today, we saw the best practices followed by our Support Engineers in making the resync process faster.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

var google_conversion_label = "owonCMyG5nEQ0aD71QM";

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Reviews

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF