Need help?

Our experts will login to your server within 30 minutes to fix urgent issues.

Customer support or server support, make your own solution using our support configuration wizard.

How to fix OnApp error “Fatal: Errno::ECONNREFUSED Connection refused – connect(2)”

How to fix OnApp error “Fatal: Errno::ECONNREFUSED Connection refused – connect(2)”

Stability is a popular reason why people choose cloud servers for their business. It is widely perceived that cloud systems can automatically recover from failures, and keep the data safe. While that is true to a large extent, cloud systems are as susceptible to failure as any other system.

One common failure point is network. Cloud systems comprise of a lot of sub-systems such as storage, backups, compute devices, etc. Each of these sub-systems talks to each other to perform critical functions such as creating new cloud instances, editing resource limits, etc. If there’s an issue with networks, servers wouldn’t be able to talk to each other, and cloud management functions will fail.

Network issues can happen all the time. It can range from simple authentication errors to network hardware issues. Bobcares helps data centers and cloud providers trace and fix network issues through our dedicated support services and server administration services. As part of our services, we monitor server infrastructure 24/7, and resolve service issues.

One day, we got a notification that attempts to create new cloud instances were failing in a data center we manage. We tried creating a new cloud instance, and saw that everything worked well until OnApp tried to allocate storage using the “BuildDisk Action“. The error shown was:

Remote Server: 10.0.1.21
Running: Storage API Call: POST 10.0.1.21:8080/is/Datastore/pmjl4tghe52pzq/VDisk "{\"name\":\"dfrnlizerqysvz\",\"size\":\"49152\",\"hostids\":\"3,2,4\"}"
Errno::ECONNREFUSED Connection refused - connect(2)
Fatal: Errno::ECONNREFUSED Connection refused - connect(2)
Executing Rollback...
Remote Server: 10.0.1.21
Running: Storage API Call: GET 10.0.1.21:8080/is/Id nil
Errno::ECONNREFUSED Connection refused – connect(2)

 

The error indicated that the OnApp Management Server was unable to communicate with a service called “Storage API” that ran in a backup server. OnApp relies on this Storage API service to allocate storage locations for new cloud instances. New cloud instance creation was failing because OnApp was unable to allocate storage.

Fatal: Errno::ECONNREFUSED Connection refused

OnApp Management Server unable to communicate with Storage API running in Backup System

 

A break in communication between two servers could happen due to many reasons. In this post we’ll go through how we resolved this current issue.

 

Resolving “Fatal: Errno::ECONNREFUSED Connection refused”

For OnApp to be able to communicate to Storage API,

  • The OnApp management server should be able to connect to the backup servers via SSH.
  • The “Storage API” service should be running in the backup server.
  • Connections should be allowed to backup server’s port 8080.

To troubleshoot the issue, we went through each of the above possibilities.

 

Checking OnApp SSH keys

OnApp management server relies on a set of SSH keys to connect to all servers in the cloud system. If these SSH keys are lost or corrupted in any way, OnApp wouldn’t be able to connect to the servers, and management actions would fail. The SSH keys are installed under a user called “onapp” in all these servers which allows OnApp management server to remotely execute commands.

To test this, we tried connecting to these servers:

# sudo onapp ssh root@10.0.1.21

The connection went through perfectly fine, and we then knew the SSH keys were not the problem.

 

Checking if Storage API is running

Next we tried connecting to port 8080 of the backup server (with IP 10.0.1.21). The connection failed, which meant that either the Storage API was not running or that firewall rules were blocking the connections.

So, we logged in to the backup server and checked if Storage API was indeed running:

tcp     0      0 0.0.0.0:8080     0.0.0.0:*     LISTEN    0     21694     3644/python

Storage API service, which listens on port 8080 was indeed running. So, it was likely that a firewall rule was blocking connections to port 8080.

 

Checking for firewall blocks

For situations exactly such as this, we keep backups of critical configuration files in each server. We restored the firewall configuration file from the daily backup and restarted the firewall service and Storage API service.

Now, connections to port 8080 started giving a response.

# nc  10.0.1.21  8080
HTTP/1.1 400 Bad Request
Content-Length: 30
Content-Type: text/plain

Next, we tried creating a new cloud instance, and everything worked perfectly fine. 🙂

 

Network errors can happen due to a variety of reasons. Here we’ve covered some common causes in OnApp systems. Networks can fail if any of its sub-components fail. To quickly troubleshoot network downtimes, it is important to be aware of how network components interact with each other. Bobcares helps data centers and cloud service providers minimize service downtimes through proactive systems audits, 24/7 monitoring, and 24/7 emergency administration.

 

Bobcares helps data centers, web hosts and other online businesses deliver reliable, secure services through 24/7 technical support and server management.

SEE HOW WE CAN HELP YOU

 


Submit a Comment

Your email address will not be published. Required fields are marked *

Bobcares
Bobcares is a server management company that helps businesses deliver uninterrupted and secure online services. Our engineers manage close to 51,500 servers that include virtualized servers, cloud infrastructure, physical server clusters, and more.
MORE ABOUT BOBCARES