People often assume that backups are not important in a cloud setup due to its automatic fail-over feature. Automatic fail-overs migrate Virtual machines in a cloud from one hypervisor to another, when any one hypervisor goes down.
But this fail-over feature does not help in cases such as degradation of RAID arrays, server hacks, malware attacks or human errors, which can lead to accidental deletion or modification of data.
GHOST vulnerability of Glibc was disclosed on 27th Jan. As with any breaking news about vulnerabilities, the initial reports were muddled about the severity of impact, and the extend of exploits running in the wild.
Bobcares Dedicated Linux Systems Administrators deliver zero-day protection against breaking vulnerabilities through agile security reaction procedures. In this case, the announcement said attackers can exploit the gethostbyname() function provided by Glibc, with a proof of concept hack done on an Exim server. So, the first order of business was to prevent any such hacks taking place in servers under our care. (more…)
Agile infrastructure security – How central configuration management was used to quickly patch GHOST glibc vulnerability in data centers was last modified: July 14th, 2018 by Visakh S
The data was conclusive. The servers orion-47, orion-50 and orion-52 needed RAM upgrades. In the past one month their RAM usage has mostly been above 85% and was showing an increasing trend. Swap usage has grown by more than 20% and it was resulting in higher I/O wait and thereby a slight tendency for high load.
These servers were part of a load balancing cluster that served a SaaS application in a data center managed by our Dedicated Linux Systems Administrators. The occasion was our weekly review of alert trends, and corrective actions needed to prevent a performance degradation. Regular analysis of alert trends allow us to predict future resource bottle necks, and prevent service deterioration.
It is unwelcome, it is tedious, but it is inevitable.
Every service provider dreads a hard disk crash, and the downtime it can lead to, but it is one eventuality that will happen sooner or later.
Today was one such day. A high priority alert notified our Dedicated Linux Server Administrators about a degraded RAID array in a data center we managed. Hard disk crashes are a P0 (highest priority) alert in our infrastructure management procedures, and initiates an emergency response.
A heap buffer overflow vulnerability in GNU C Library (glibc), allows remote or local actors to execute arbitrary code under the privilege of user running the function gethostbyname(). Qualsys, who reported the bug was able to remotely exploit this bug in an Exim mail server.
Linux servers with stable distributions marked as long term support are likely to be affected by this bug (CVE-2015-0235). The distributions we have counted till now include: (more…)
GHOST hunting – Resolving glibc Remote Code Execution vulnerability (CVE-2015-0235) in CentOS, Red Hat, Ubuntu, Debian and SUSE Linux servers was last modified: July 9th, 2018 by Visakh S
“I did nothing. It just crashes all the time!” So began a professional administration request at the help desk of a data center we managed. The customer’s unmanaged Windows 2008 R2 VPS started crashing one fine day without any apparent reason.
The event logs didn’t show anything out of the ordinary. So, the next step was to analyze the crash dump.
We were responding to an issue raised by an onsite technician for a data center we managed. System logs from one server was missing in the central log server. It looked like the Rsyslog service that was used for central logging had crashed in the source server, leading to 2 hours of lost log information.
Logs are critical to day-to-day server management and missing logs were an urgent priority issue. Rsyslog service was restarted in the source server, and debugging was enabled to identify what had gone wrong. Looking at the update logs, we noted that the Rsyslog package was recently updated, which pointed to a possible bug. A quick stop at the Rsyslog github bug database confirmed that crashes were reported, and a patch was available. An update was done in all servers to fix the issue. But it still left the question, what if a future update causes a similar crash? We needed a solution to ensure the central logging is resilient to failure. (more…)
Fault tolerant service logging – How remote logging was made resilient to crashes was last modified: July 12th, 2018 by Visakh S
The mood was upbeat. It was our weekly business review with a web host we support. Server improvements had resulted in zero service downtimes, and zero customer complaints on service reliability. It was time to figure out how to improve the infrastructure even further, and for that, we looked at the support requests.
Support requests give a gold mine of information on how customers are perceiving the service. Happy customers do not open trouble tickets. So, all support requests are a potential pointer to a system or process improvement. So, we started by looking at the top reasons for support tickets.
Bobcares is a server management company that helps businesses deliver uninterrupted and secure online services. Our engineers manage close to 51,500 servers that include virtualized servers, cloud infrastructure, physical server clusters, and more.