Select Page

Data Center Management Blogs


How to troubleshoot high load in linux web hosting servers

How to troubleshoot high load in linux web hosting servers

High load, aka high load average is the most common reason for business downtime in the web hosting industry.

Applications freeze, websites timeout, and customers abandon cart.

Yeah, we hate it too. That’s why, here at Bobcares, our Dedicated Server Admins monitor our customer’s servers 24/7 and fix load issues in as little as 5 minutes.

(more…)

How to configure VM backups in oVirt

How to configure VM backups in oVirt

People often assume that backups are not important in a cloud setup due to its automatic fail-over feature. Automatic fail-overs migrate Virtual machines in a cloud from one hypervisor to another, when any one hypervisor goes down.

But this fail-over feature does not help in cases such as degradation of RAID arrays, server hacks, malware attacks or human errors, which can lead to accidental deletion or modification of data.

(more…)

Agile infrastructure security – How central configuration management was used to quickly patch GHOST glibc vulnerability in data centers

Agile infrastructure security – How central configuration management was used to quickly patch GHOST glibc vulnerability in data centers

GHOST vulnerability of Glibc was disclosed on 27th Jan. As with any breaking news about vulnerabilities, the initial reports were muddled about the severity of impact, and the extend of exploits running in the wild.

Bobcares Dedicated Linux Systems Administrators deliver zero-day protection against breaking vulnerabilities through agile security reaction procedures. In this case, the announcement said attackers can exploit the gethostbyname() function provided by Glibc, with a proof of concept hack done on an Exim server. So, the first order of business was to prevent any such hacks taking place in servers under our care.
(more…)

Burning in your new server : How server hardware load testing helps us improve data center infrastructure reliability

Burning in your new server : How server hardware load testing helps us improve data center infrastructure reliability

The data was conclusive. The servers orion-47, orion-50 and orion-52 needed RAM upgrades. In the past one month their RAM usage has mostly been above 85% and was showing an increasing trend. Swap usage has grown by more than 20% and it was resulting in higher I/O wait and thereby a slight tendency for high load.

These servers were part of a load balancing cluster that served a SaaS application in a data center managed by our Dedicated Linux Systems Administrators. The occasion was our weekly review of alert trends, and corrective actions needed to prevent a performance degradation. Regular analysis of alert trends allow us to predict future resource bottle necks, and prevent service deterioration.

(more…)

Safe data recovery : Dos and Dont’s of rebuilding RAID arrays in data centers after a hard disk drive failure

Safe data recovery : Dos and Dont’s of rebuilding RAID arrays in data centers after a hard disk drive failure

It is unwelcome, it is tedious, but it is inevitable.

Every service provider dreads a hard disk crash, and the downtime it can lead to, but it is one eventuality that will happen sooner or later.

Today was one such day. A high priority alert notified our Dedicated Linux Server Administrators about a degraded RAID array in a data center we managed. Hard disk crashes are a P0 (highest priority) alert in our infrastructure management procedures, and initiates an emergency response.

(more…)

GHOST hunting – Resolving glibc Remote Code Execution vulnerability (CVE-2015-0235) in CentOS, Red Hat, Ubuntu, Debian and SUSE Linux servers

GHOST hunting – Resolving glibc Remote Code Execution vulnerability (CVE-2015-0235) in CentOS, Red Hat, Ubuntu, Debian and SUSE Linux servers

Reports are coming in from our Dedicated Linux Systems Administrators about an evolving threat, disclosed earlier today.

A heap buffer overflow vulnerability in GNU C Library (glibc), allows remote or local actors to execute arbitrary code under the privilege of user running the function gethostbyname(). Qualsys, who reported the bug was able to remotely exploit this bug in an Exim mail server.

Linux servers with stable distributions marked as long term support are likely to be affected by this bug (CVE-2015-0235). The distributions we have counted till now include:
(more…)

Surviving the blue screen of death – How a Hyper-V Windows VPS fatal error was resolved

Surviving the blue screen of death – How a Hyper-V Windows VPS fatal error was resolved

I did nothing. It just crashes all the time!
So began a professional administration request at the help desk of a data center we managed. The customer’s unmanaged Windows 2008 R2 VPS started crashing one fine day without any apparent reason.

The event logs didn’t show anything out of the ordinary. So, the next step was to analyze the crash dump.

(more…)

Fault tolerant service logging – How remote logging was made resilient to crashes

Logs from alpha-p3 is missing!

We were responding to an issue raised by an onsite technician for a data center we managed. System logs from one server was missing in the central log server. It looked like the Rsyslog service that was used for central logging had crashed in the source server, leading to 2 hours of lost log information.

Logs are critical to day-to-day server management and missing logs were an urgent priority issue. Rsyslog service was restarted in the source server, and debugging was enabled to identify what had gone wrong. Looking at the update logs, we noted that the Rsyslog package was recently updated, which pointed to a possible bug. A quick stop at the Rsyslog github bug database confirmed that crashes were reported, and a patch was available. An update was done in all servers to fix the issue. But it still left the question, what if a future update causes a similar crash? We needed a solution to ensure the central logging is resilient to failure. (more…)

Reliable, scalable DNS – How DNS clustering and Centralized name servers resulted in fast, scalable and fault tolerant DNS service

The mood was upbeat. It was our weekly business review with a web host we support. Server improvements had resulted in zero service downtimes, and zero customer complaints on service reliability. It was time to figure out how to improve the infrastructure even further, and for that, we looked at the support requests.

Support requests give a gold mine of information on how customers are perceiving the service. Happy customers do not open trouble tickets. So, all support requests are a potential pointer to a system or process improvement. So, we started by looking at the top reasons for support tickets.

Bind Central DNS
(more…)

How to turn your hosting Green

We know that the internet usage is increasing day by day. We cannot even imagine a day without using internet. Statistics shows that the internet usage is growing from 400 to 1000% a year worldwide. Web hosting providers consume a large amount of energy to run servers and other services such as cooling controls. If energy consumption is increasing in this rate, then by 2020, this industry will be more polluting than airline industry. 0.2% of world’s carbon dioxide is emitted by the data centers alone.

(more…)

Page 1 of 3123