Bobcares

Burning in your new server : How server hardware load testing helps us improve data center infrastructure reliability

The data was conclusive. The servers orion-47, orion-50 and orion-52 needed RAM upgrades. In the past one month their RAM usage has mostly been above 85% and was showing an increasing trend. Swap usage has grown by more than 20% and it was resulting in higher I/O wait and thereby a slight tendency for high load.

These servers were part of a load balancing cluster that served a SaaS application in a data center managed by our Dedicated Linux Systems Administrators. The occasion was our weekly review of alert trends, and corrective actions needed to prevent a performance degradation. Regular analysis of alert trends allow us to predict future resource bottle necks, and prevent service deterioration.

Decision was taken to upgrade the RAM over the week end, with 3 days allocated for reliability testing of the new hardware. Reliability testing also sometimes referred to as torture testing, stress testing or load testing, allows us to understand the limits of the system in the new configuration, and thereby help in capacity planning.

Since all servers ran Linux, we used the command “stress” to simulate high load on the server. An example usage is:

stress --cpu 300 --vm 3 --vm-bytes 31G --io 4 --timeout 4d --hdd 4 --verbose

 

Run over a period of 3 days, successful tests will result in the system remaining stable and responsive. Test failures are usually manifested through system freezes or segmentation faults. The reasons for failure can range from defective or incompatible hardware, to planned performance limits probing. After the testing period, the system is truly broken in, with the failure limits recorded in our asset log. This allows us to predict the time after which the server might be due for another upgrade, or retirement.

Servers need to be “broken in” and load tested before it is deployed into production environment to make sure it can take the load expected on the server. Here are a couple of pointers:

  1. Load testing can be conservatively done by creating 3 scenarios – Normal load conditions, Maximum load conditions and Extreme load conditions. Normal load conditions will help you understand if there are hardware defects. Maximum and extreme load conditions will help you detect hardware spec incompatibilities and limits of system capability.
  2. Constantly monitor the system for failures. Constant monitoring shows you errors in their contexts, and it helps you to intervene in case the system develops critical issues like over heating.

Bobcares systems administrators take care of tech support and infrastructure management for data centers and web hosts. Are you looking for ways to improve your service quality?

See how we can help

 

0 Comments

Speed issues driving customers away?
We’ve got your back!

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie
_clck, _clsk, CLID, ANONCHK, MR, MUID, SM

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

_reb2bgeo - The visitor's geographical location

_reb2bloaded - Whether or not the script loaded for the visitor

_reb2bref - The referring URL for the visit

_reb2bsessionID - The visitor's RB2B session ID

_reb2buid - The visitor's RB2B user ID

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid
_reb2bgeo, _reb2bloaded, _reb2bref, _reb2bsessionID, _reb2buid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF