How to fix OnApp error “Fatal: Errno::ECONNREFUSED Connection refused – connect(2)”

How to fix OnApp error “Fatal: Errno::ECONNREFUSED Connection refused – connect(2)”

Stability is a popular reason why people choose cloud servers for their business. It is widely perceived that cloud systems can automatically recover from failures, and keep the data safe. While that is true to a large extent, cloud systems are as susceptible to failure as any other system.

One common failure point is network. Cloud systems comprise of a lot of sub-systems such as storage, backups, compute devices, etc. Each of these sub-systems talks to each other to perform critical functions such as creating new cloud instances, editing resource limits, etc. If there’s an issue with networks, servers wouldn’t be able to talk to each other, and cloud management functions will fail.

Network issues can happen all the time. It can range from simple authentication errors to network hardware issues. Bobcares helps data centers and cloud providers trace and fix network issues through our dedicated support services and server administration services. As part of our services, we monitor server infrastructure 24/7, and resolve service issues.

One day, we got a notification that attempts to create new cloud instances were failing in a data center we manage. We tried creating a new cloud instance, and saw that everything worked well until OnApp tried to allocate storage using the “BuildDisk Action“. The error shown was:

Remote Server: 10.0.1.21
Running: Storage API Call: POST 10.0.1.21:8080/is/Datastore/pmjl4tghe52pzq/VDisk "{\"name\":\"dfrnlizerqysvz\",\"size\":\"49152\",\"hostids\":\"3,2,4\"}"
Errno::ECONNREFUSED Connection refused - connect(2)
Fatal: Errno::ECONNREFUSED Connection refused - connect(2)
Executing Rollback...
Remote Server: 10.0.1.21
Running: Storage API Call: GET 10.0.1.21:8080/is/Id nil
Errno::ECONNREFUSED Connection refused – connect(2)

 

The error indicated that the OnApp Management Server was unable to communicate with a service called “Storage API” that ran in a backup server. OnApp relies on this Storage API service to allocate storage locations for new cloud instances. New cloud instance creation was failing because OnApp was unable to allocate storage.

Fatal: Errno::ECONNREFUSED Connection refused

OnApp Management Server unable to communicate with Storage API running in Backup System

 

A break in communication between two servers could happen due to many reasons. In this post we’ll go through how we resolved this current issue.

 

Resolving “Fatal: Errno::ECONNREFUSED Connection refused”

For OnApp to be able to communicate to Storage API,

  • The OnApp management server should be able to connect to the backup servers via SSH.
  • The “Storage API” service should be running in the backup server.
  • Connections should be allowed to backup server’s port 8080.

To troubleshoot the issue, we went through each of the above possibilities.

 

Checking OnApp SSH keys

OnApp management server relies on a set of SSH keys to connect to all servers in the cloud system. If these SSH keys are lost or corrupted in any way, OnApp wouldn’t be able to connect to the servers, and management actions would fail. The SSH keys are installed under a user called “onapp” in all these servers which allows OnApp management server to remotely execute commands.

To test this, we tried connecting to these servers:

# sudo onapp ssh root@10.0.1.21

The connection went through perfectly fine, and we then knew the SSH keys were not the problem.

 

Checking if Storage API is running

Next we tried connecting to port 8080 of the backup server (with IP 10.0.1.21). The connection failed, which meant that either the Storage API was not running or that firewall rules were blocking the connections.

So, we logged in to the backup server and checked if Storage API was indeed running:

tcp     0      0 0.0.0.0:8080     0.0.0.0:*     LISTEN    0     21694     3644/python

Storage API service, which listens on port 8080 was indeed running. So, it was likely that a firewall rule was blocking connections to port 8080.

 

Checking for firewall blocks

For situations exactly such as this, we keep backups of critical configuration files in each server. We restored the firewall configuration file from the daily backup and restarted the firewall service and Storage API service.

Now, connections to port 8080 started giving a response.

# nc  10.0.1.21  8080
HTTP/1.1 400 Bad Request
Content-Length: 30
Content-Type: text/plain

Next, we tried creating a new cloud instance, and everything worked perfectly fine. 🙂

 

Network errors can happen due to a variety of reasons. Here we’ve covered some common causes in OnApp systems. Networks can fail if any of its sub-components fail. To quickly troubleshoot network downtimes, it is important to be aware of how network components interact with each other. Bobcares helps data centers and cloud service providers minimize service downtimes through proactive systems audits, 24/7 monitoring, and 24/7 emergency administration.

 

Bobcares helps data centers, web hosts and other online businesses deliver reliable, secure services through 24/7 technical support and server management.

SEE HOW WE CAN HELP YOU

 


Submit a Comment

Your email address will not be published. Required fields are marked *

Bobcares
Bobcares is a server management company that helps businesses deliver uninterrupted and secure online services. Our engineers manage close to 51,500 servers that include virtualized servers, cloud infrastructure, physical server clusters, and more.
MORE ABOUT BOBCARES

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID

Close your account?

Your account will be closed and all data will be permanently deleted and cannot be recovered. Are you sure?