Need help?

Our experts have had an average response time of 12.14 minutes in September 2021 to fix urgent issues.

We will keep your servers stable, secure, and fast at all times for one fixed price.

Soft Lockup on Xen Hypervisor – How to prevent Dom0 CPU starve

by | Jun 26, 2021

Wondering how to set Soft Lockup on Xen Hypervisor? We can help you.

As part of our Server Virtualization Technologies and Services, we assist our customers with several OnApp queries.

Today, let us discuss Soft Lockup on Xen Hypervisor.

 

Soft Lockup on Xen Hypervisor

It is possible for fresh hypervisors that run CentOS 6.XX and XEN4 to hung/Kernel panic without any VMs running.

The error may look like this:

kernel:BUG: soft lockup - CPU#16 stuck for 22s! [stress:6229] 

Message from [email protected] at Aug 30 09:56:27 ... 
 kernel:BUG: soft lockup - CPU#16 stuck for 22s! [stress:6229]

The Dmesg output will be similar to this:

Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81140a53>] exit_mmap+0xe3/0x160 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8104fde4>] mmput+0x64/0x140 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056d25>] exit_mm+0x105/0x130 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056fcd>] do_exit+0x16d/0x450 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8113df2c>] ? handle_pte_fault+0x1ec/0x210 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81057305>] do_group_exit+0x55/0xd0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81067294>] get_signal_to_deliver+0x224/0x4d0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8101489b>] do_signal+0x5b/0x140 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8126f17d>] ? rb_insert_color+0x9d/0x160 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81083863>] ? finish_task_switch+0x53/0xe0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81576fe7>] ? __schedule+0x3f7/0x710 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff810149e5>] do_notify_resume+0x65/0x80 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8157862c>] retint_signal+0x48/0x8c 
 Aug 30 09:59:00 HV3-cloud kernel: Code: cc 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 
 Aug 30 09:59:00 HV3-cloud kernel: Call Trace: 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81009e2d>] ? xen_force_evtchn_callback+0xd/0x10 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8100a632>] check_events+0x12/0x20 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8100a61f>] ? xen_restore_fl_direct_reloc+0x4/0x4 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8111dc06>] ? free_hot_cold_page+0x126/0x1b0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81005660>] ? xen_get_user_pgd+0x40/0x80 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8111dfe4>] free_hot_cold_page_list+0x54/0xa0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81121b18>] release_pages+0x1b8/0x220 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8114da64>] free_pages_and_swap_cache+0xb4/0xe0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81268da1>] ? cpumask_any_but+0x31/0x50 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81139bbc>] tlb_flush_mmu+0x6c/0x90 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8113a0a4>] tlb_finish_mmu+0x14/0x40 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81140a53>] exit_mmap+0xe3/0x160 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8104fde4>] mmput+0x64/0x140 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056d25>] exit_mm+0x105/0x130 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81056fcd>] do_exit+0x16d/0x450 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8113df2c>] ? handle_pte_fault+0x1ec/0x210 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81057305>] do_group_exit+0x55/0xd0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81067294>] get_signal_to_deliver+0x224/0x4d0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8101489b>] do_signal+0x5b/0x140 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8126f17d>] ? rb_insert_color+0x9d/0x160 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81083863>] ? finish_task_switch+0x53/0xe0 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff81576fe7>] ? __schedule+0x3f7/0x710 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff810149e5>] do_notify_resume+0x65/0x80 
 Aug 30 09:59:00 HV3-cloud kernel: [<ffffffff8157862c>] retint_signal+0x48/0x8c 
 Aug 30 09:59:02 HV3-cloud kernel: BUG: soft lockup - CPU#5 stuck for 22s! [stress:6233] 
 Aug 30 09:59:02 HV3-cloud kernel: Modules linked in: arptable_filter arp_tables ip6t_REJECT ip6table_mangle ipt_REJECT iptable_filter ip_tables bridge stp llc xen_pciback xen_gntalloc bonding nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_round_robin dm_multipath xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd ufs(O) coretemp hwmon crc32c_intel ghash_clmulni_intel aesni_intel cryptd aes_x86_64 aes_generic microcode pcspkr sb_edac edac_core joydev i2c_i801 sg iTCO_wdt iTCO_vendor_support igb evdev ixgbe mdio ioatdma myri10ge dca ext4 mbcache jbd2 raid1 sd_mod crc_t10dif ahci libahci isci libsas scsi_transport_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] 
 Aug 30 09:59:02 HV3-cloud kernel: CPU 5

 

Cause

While we install the hypervisor, to share the CPU fairly between the VMs we set few parameters on the Dom0.

Though this works on previous XEN/CentOS Versions it seems to starve the Dom0 on current XEN 4 / CentOS releases.

Generally, this is the parameter set by default:

xm sched-credit -d 0 -c 200

Here, -c is the cap. It optionally fixes the maximum amount of CPU a domain will be able to consume, even if the host system has idle CPU cycles.

[[email protected] ~]# xm sched-credit Name ID Weight Cap Domain-0 0 65535 200

This seems to starve the Dom0 CPU on servers that scale down the CPU power.

 

Resolution:

Our Support Techs recommend these steps to handle RHEL/CentOS 6.x with XEN 4.2.x hypervisor(s) crashes:

1. Initially, to check values of Ratelimit and Tslice for Cpu-Pool (only for Centos6/XEN4), we run:

[email protected] ~# xl -f sched-credit

Then we try to set them like below if it shows a different output:

[email protected] ~# xl -f sched-credit -s -t 5ms -r 100us

Or:

[email protected] ~# service xend stop

[email protected] ~# xl -f sched-credit -s -t 5ms -r 100us

[email protected] ~# service xend start

2. After that, we set default Credit Scheduler CAP and Weight values for Domain-0:

# xm sched-credit -d Domain-0 -w <WEIGHT> -c <CAP>

Here,

WEIGHT=600 for small HVs or cpu_cores/2*100 for large HVs;

CAP=0 for small HVs with few VMs and low CPU overselling or cpu_cores/2*100 for large HVs with huge CPU overselling;

For example, for HV with 8 cores:

# xm sched-credit -d Domain-0 -w 600 -c 0

Otherwise, we can set by default to 6000:

# xm sched-credit -d Domain-0 -w 6000 -c 0

If the changes help, we change the CAP and Weight values in the /onapp/onapp-hv.conf file:

# vi /onapp/onapp-hv.conf
 XEN_DOM0_SCHEDULER_WEIGHT=<WEIGHT>
 XEN_DOM0_SCHEDULER_CAP=<CAP>

3. Finally, we try to assign a certain number of vCPUs in /etc/grub.conf for Domain-0 like below:

# cat /boot/grub/grub.conf | grep dom0
 kernel /xen.gz dom0_mem=409600 dom0_max_vcpus=2
If the changes help, we change the maximum number of vCPUs value in the /onapp/onapp-hv.conf file:
# vi /onapp/onapp-hv.conf

XEN_DOM0_MAX_VCPUS=2

In order for changes to take effect, we do a system reboot.

[Need help with the procedures? We can help you]

 

Conclusion

In short, we saw how our Support Techs set Soft Lockup on Xen Hypervisor.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

var google_conversion_label = "owonCMyG5nEQ0aD71QM";

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Categories

Tags

Privacy Preference Center

Necessary

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot function properly without these cookies.

PHPSESSID - Preserves user session state across page requests.

gdpr[consent_types] - Used to store user consents.

gdpr[allowed_cookies] - Used to store user allowed cookies.

PHPSESSID, gdpr[consent_types], gdpr[allowed_cookies]
PHPSESSID
WHMCSpKDlPzh2chML

Statistics

Statistic cookies help website owners to understand how visitors interact with websites by collecting and reporting information anonymously.

_ga - Preserves user session state across page requests.

_gat - Used by Google Analytics to throttle request rate

_gid - Registers a unique ID that is used to generate statistical data on how you use the website.

smartlookCookie - Used to collect user device and location information of the site visitors to improve the websites User Experience.

_ga, _gat, _gid
_ga, _gat, _gid
smartlookCookie

Marketing

Marketing cookies are used to track visitors across websites. The intention is to display ads that are relevant and engaging for the individual user and thereby more valuable for publishers and third party advertisers.

IDE - Used by Google DoubleClick to register and report the website user's actions after viewing or clicking one of the advertiser's ads with the purpose of measuring the efficacy of an ad and to present targeted ads to the user.

test_cookie - Used to check if the user's browser supports cookies.

1P_JAR - Google cookie. These cookies are used to collect website statistics and track conversion rates.

NID - Registers a unique ID that identifies a returning user's device. The ID is used for serving ads that are most relevant to the user.

DV - Google ad personalisation

IDE, test_cookie, 1P_JAR, NID, DV, NID
IDE, test_cookie
1P_JAR, NID, DV
NID
hblid

Security

These are essential site cookies, used by the google reCAPTCHA. These cookies use an unique identifier to verify if a visitor is human or a bot.

SID, APISID, HSID, NID, PREF
SID, APISID, HSID, NID, PREF