Need help?

Our experts have had an average response time of 13.14 minutes in February 2024 to fix urgent issues.

We will keep your servers stable, secure, and fast at all times for one fixed price.

Fixing OnApp error “onappstore onlineVDisk [] failed on frontend [] with error map”

by | Nov 23, 2015

A website suddenly goes offline. The server is un-responsive. A server reboot is attempted, but the server fails to come back online.

This is typical of the way a server crash would unfold. As a server management company, we’ve often seen such issues with dedicated servers, however, recently we’ve started seeing such issues in cloud systems as well. People tend to associate cloud systems with stability, but even cloud system hardware and software are susceptible to failures.

When such server (or cloud instance) failures happen, potentially several thousands of users would be cut off from their business transactions. So, for us, server downtimes are high priority issues, which we resolve as soon as possible.

Recently our team fixed a cloud instance downtime as part of our dedicated support services. A US based cloud service provider used our services to manage their OnApp cloud management systems and to deliver 24/7 technical support. As part of technical support, we help server instance owners recover from service downtimes, and custom configure their servers.

One day, an emergency server recovery request was logged by a website owner who used a server instance on the OnApp system. The server instance had failed to respond, and even after repeated reboot requests, the server instance stayed offline.

Our team quickly got on the case, and as the first step, analyzed the OnApp system logs. We saw a log entry related to the customer’s server instance:

Fatal: OnApp::Actions::Fatal Storage API Call failed: {"result"=>"FAILURE", "error"=>"onappstore onlineVDisk fhxishv3q51grj failed on frontend 677973259 with error map: [('677973259', u'Failed to find any active members in sync')] and optional error: API call failed for a subset of nodes. Failures: [('677973259', u'Failed to find any active members in sync')]"}

This log showed that customer’s server instance (with the ID fhxishv3q51grj) had an error in its storage device. To fix this error, we had to:

This is the story of how we did it.

 

Tracing error “onappstore onlineVDisk [] failed”

Before we get into the error details, let’s get a bit of background.

OnApp allows aggregating different storage devices (such as hard disks, SSD drives, SAS, etc.) to form a single virtual storage system called “Integrated Data Store”. In this case, it looked like the Data Store the website was hosted on, developed an error which was preventing the server instance from booting up.

To do that, we needed to identify the Data Store linked to the server instance. So, we looked further into the system logs and found this entry:

Running: Storage API Call: PUT 10.0.0.51:8080/is/Datastore/pmjl4tghe52pzq/VDisk/fhxishv3q51grj "{\"state\":2,\"frontend_uuid\":\"677973259\"}"

It showed that the Data Store being used was pmjl4tghe52pzq.

Now, to get the exact reason why the Data Store was showing errors, we needed to locate the “Node” in which the website was hosted. OnApp divides each Data Store into several “Nodes” on which several server instances would be hosted.

OnApp datastore structure

How OnApp stores data in Nodes and DataStores.

 

A quick look at the “Nodes” list in the OnApp admin panel showed the Node ID of the affected server instance as 3843437812. Once we had the Node ID, we logged into that node to check what could be wrong with the disks. We saw several Input/Output errors related to XFS file system in the Node logs:

XFS (xvda): metadata I/O error: block 0x22eb087a ("xlog_iodone") error 5 buf count 12288 XFS (xvda): xfs_log_force: error 5 returned. -- [xvda]

This showed that the XFS file system in one of the hard disks in the node 3843437812 was corrupted and needed to be repaired.

 

Fixing the file system corruption

OnApp stores the data of different server instances in storage areas called “Nodes”. Server instances run on huge servers called Hypervisors, and storage “Nodes” are linked to Hypervisors so that each server instance can access their data. In this case, Node 3843437812 was linked to a Hypervisor called “Xen Kappa 02“.

In order to fix the XFS file system error, the corrupted hard disk in Node 3843437812 had to be first de-linked from Xen Kappa 02. For that, we logged in to Xen Kappa 02, and issued a diskhotplug unassign command, which de-linked /dev/sdc (which was the name of the corrupted hard disk).

Then, we fixed the XFS error with the command xfs_repair /dev/sdc. Once completed, the logs  showed the disk having no more errors. We then mounted back /dev/sdc using the command diskhotplug assign, and restarted the customer’s server instance.

Everything worked perfectly fine. 🙂

 

Bobcares helps data centers, web hosts and other online businesses deliver reliable, secure services through 24/7 technical support and server management.

SEE HOW WE CAN HELP YOU

 

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Categories

Tags

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie
_clck, _clsk, CLID, ANONCHK, MR, MUID, SM

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF