Bobcares

Identifying resource usage and performing migration to address performance issues in a Xen server virtualization environment

by | Jan 13, 2016

When Netflix faced an outage in 2011, they had to offer a 3% credit to their customers. The outage was due to a resource crunch in the Amazon cloud they were hosted in. It’s tough for a business to survive if such outages happen frequently.

A cloud hosting provider achieves economies of scale by sharing resources between multiple tenants in the cloud. These resources include computing, storage and network devices. But at times, the cloud provider may oversell the resources to achieve high-density or a single tenant can abuse the server resources. This can affect the performance of the cloud system.

A common performance metric in server virtualization systems is “CPU steal time”. It refers to the amount of time a VM is ready, but could not run due to other VMs competing for the CPU. Recently we were contacted by a cloud hosting provider whose customers were complaining of intermittent slowness and high CPU steal times.

Debugging the slowness issue

Steal time is the duration a VM waits for a real CPU cycle, while the hypervisor is servicing another VM. In this cloud solution, Xen hypervisor was used to manage the VMs. CPU stealing in Xen can happen due to:

  1. Resource-intensive applications taking up too much CPU in individual VMs.
  2. The cloud servers running short of resources.

To identify the reason for CPU stealing, we monitored the individual VMs in the cloud. Resource-intensive applications and frequency of high steal times were audited using ‘top’ utility. We noticed that CPU steal times of the slow VMs were continuously over 60%.

top - 10:26:21 up 45 days, 2:00, 2 users, load average: 8.01, 4.04, 2.46
Tasks: 496 total, 21 running, 475 sleeping, 0 stopped, 0 zombie
Cpu(s): 29.8%us, 3.1%sy, 0.7%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 61.1%st

But adding more CPU for those VMs was not possible, as all the cloud servers were already near their physical limits.

Migrating the slow VMs

To improve speed and responsiveness, we decided to migrate the slow VMs to another host. We executed the following plan to accomplish this.

1.We configured a new host server with CentOS 6 and installed Xen hypervisor packages in it.

[root@Host02 ~]# yum install centos-release-xen
[root@Host02 ~]# yum install xen

2. We then updated the boot loader to load this new image and rebooted the machine.

[root@Host02 ~]# sh /usr/bin/grub-bootxen.sh
 Updating grub config

3. Once the Xen hypervisor started running, we migrated the slow VMs from the old host to the new server. The VMs were allotted enough CPU and resources to ensure that they ran without any hiccups.

[root@Host02 ~]# xm list 
Name                                 ID   Mem VCPUs State   Time(s) 
Domain-0                              0  8192   32 r----- 4589703.7 
lhry2j7b1108p3                       02  3048   3  -b---- 785333.0 
res69fvhdx8h12                       03   512   2  -b----  84255.3 
tqr01qycerunum                       04   512   2  r----- 3381600.7

4. After the migration was done, we audited the resource usage in the VM nodes and found that the steal time was normal.

top - 11:15:01 up 2 days, 2:00, 2 users, load average: 0.01, 0.04, 0.06
Tasks: 96 total, 5 running, 91 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.1%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Monitoring script for Xen VMs

To avoid downtime and slowness issues in future, we configured a Perl script for monitoring the VMs in our Xen nodes. We installed the ‘Xentop’ utility to identify the processes that consume resources in each VM. The script would monitor the CPU usage in the VMs and send alert notification to our support team whenever there is a spike. Upon receiving the alert, we would debug and resolve it before it affected the customers.

Performance is a key requirement for any cloud hosting solution. Here we’ve covered how we scaled up a Xen cloud environment and did pro-active monitoring to ensure 100% uptime for all VMs.  Bobcares helps web hosts, VPS providers and cloud providers deliver industry standard VPS services through custom configuration and preventive maintenance of virtualized systems.

 

Bobcares helps web hosts, VPS providers and cloud providers deliver reliable, responsive hosting services through 24/7 technical support and pro-active management.

SEE HOW WE CAN HELP YOU

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Never again lose customers to poor
server speed! Let us help you.

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie
_clck, _clsk, CLID, ANONCHK, MR, MUID, SM

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

_reb2bgeo - The visitor's geographical location

_reb2bloaded - Whether or not the script loaded for the visitor

_reb2bref - The referring URL for the visit

_reb2bsessionID - The visitor's RB2B session ID

_reb2buid - The visitor's RB2B user ID

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid
_reb2bgeo, _reb2bloaded, _reb2bref, _reb2bsessionID, _reb2buid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF