Bobcares

Ceph Replace OSD – How to Remove, Replace and Re-create OSD?

by | Oct 14, 2021

Generally, for Ceph to replace an OSD, we remove the OSD from the Ceph cluster, replace the drive, and then re-create the OSD.

At Bobcares, we often get requests to manage Ceph, as a part of our Infrastructure Management Services.

Today, let us see how our techs replace an OSD.

 

Ceph replace OSD

Mostly, Ceph can operate in a degraded state without losing data as it is designed for fault tolerance.

For example, it can operate while a data storage drive fails.

When a drive fails, the OSD status will be down. In addition, in the cluster, the Ceph health warnings will indicate the same.

Modern servers deploy hot-swappable drives to pull a failed drive and replace it with a new one without bringing down the node.

However, with Ceph Storage we have to address the software-defined part of the OSD.

Moving ahead, let us see how our Support Techs perform the same.

Initially, we check the cluster health:

# ceph health

Suppose an OSD is down. Then we need to identify its location in the CRUSH hierarchy:

# ceph osd tree | grep -i down

Similarly, if an OSD is down and in, we log in to the OSD node and try to restart it:

# ssh {osd-node}
# systemctl start ceph-osd@{osd-id}

If the output indicates that the OSD is already running, it may be due to a heartbeat or networking issue.

On the other hand, if we fail to restart the OSD, it might be a drive failure.

  • Check the failed OSD’s mount point

In case we fail to restart the OSD, we should check the mount point.

And if the mount point no longer appears, we can try re-mounting the OSD drive and restart the OSD.

For example, suppose the server restarts, but lost the mount point in fstab. Then, we remount the drive.

# df -h

However, a failed OSD drive will not help restore the mount point.

In order to check the drive health, we can use the drive utilities.

For example:

# yum install smartmontools
# smartctl -H /dev/{drive}

If the drive fails, we need to replace it.

We need to ensure the OSD is out of the cluster:

# ceph osd out osd.<num>

Then we ensure if the OSD process is stopped:

# systemctl stop ceph-osd@<osd-id>

Similarly, we ensure the failed OSD is backfilling:

# ceph -w

Now, we need to remove the OSD from the CRUSH map:

# ceph osd crush remove osd.<num>

Then we remove the OSD’s authentication keys:

# ceph auth del osd.<num>

And, we remove the OSD from the Ceph Cluster:

# ceph osd rm osd.<num>

Later, we unmount the failed drive path:

# umount /var/lib/ceph/{daemon}/{cluster}-{daemon-id}

After that, we replace the physical drive:

ceph osd set noout

Once we complete this and bring the node and its OSDs back online, we remove the noout setting:

ceph osd unset noout

Before going further, we need to allow the new drive to appear under /dev and make a note of the drive path.

First, we find the OSD drive and format the disk.

Then, we recreate the OSD.

Eventually, we check the CRUSH hierarchy to ensure it is accurate:

ceph osd tree

We can change the location of the OSD in the CRUSH hierarchy. To do so, we can use the move command.

ceph osd crush move <bucket-to-move> <bucket-type>=<parent-bucket>

Finally, we ensure the OSD is online.

[Need further assistance? We’d be glad to help you]

 

Conclusion

To conclude, we saw how our Support Techs go about the Ceph query for our customers.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Never again lose customers to poor
server speed! Let us help you.

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie
_clck, _clsk, CLID, ANONCHK, MR, MUID, SM

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

_reb2bgeo - The visitor's geographical location

_reb2bloaded - Whether or not the script loaded for the visitor

_reb2bref - The referring URL for the visit

_reb2bsessionID - The visitor's RB2B session ID

_reb2buid - The visitor's RB2B user ID

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid
_reb2bgeo, _reb2bloaded, _reb2bref, _reb2bsessionID, _reb2buid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF