Troubleshooting Redis for Pivotal Platform will be easier when we know the specific errors to troubleshoot for.
At Bobcares we fix Redis issue as a part of our Server Management Services for our clients.
Today, let’s see how our Support Engineers troubleshoot to fix these issues easily.
Specific errors to troubleshoot Redis for Pivotal Platform:
Here are some specific errors that our Support Engineers check for while troubleshooting Redis for Pivotal Platform:
- AOF File Corrupted, Cannot Start Redis Instance
- Saving Error
- Failed Backup
- Orphaned Instances: BOSH Director Cannot See Your Instances
- Orphaned Instances: Pivotal Platform Cannot See Your Instances
- Failed to Set Credentials in Runtime CredHub
- Service Outage after Disabling TLS
We will discuss the symptoms to check for each of the above errors and what causes these errors along with the solutions to fix them.
1. AOF File Corrupted, Cannot Start Redis Instance
A corrupted AOF file can be one of the specific errors to check for while troubleshooting Redis for Pivotal Platform.
Symptom: One or more VMs might fail to start the redis server during pre-start with the error message logged in syslog:
[ErrorLog-TimeStamp] # Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof –fix `filename`
Cause: In cases of hard crashes, for example, due to power loss or VM termination without running drain scripts, our AOF file might become corrupted. The error log printed out by Redis provides a clear means of recovery.
Solution.1: Solution for Shared-VM instances:
1. SSH into cf-redis-broker instance.
2. Navigate to the directory where our AOF file is stored.
3. Run the following command:
/var/vcap/packages/redis/redis-check-aof appendonly.aof –fix
4. To SSH out of the cf-redis-broker instance and restart, run the following command:
bosh restart INSTANCE-GROUP/INSTANCE-ID
Solution.2: Solution for On-Demand-VM instances:
1. SSH into your affected service instance.
2. Navigate to the directory where our AOF file is stored. (Usually /var/vcap/store/redis/).
3. Run:
/var/vcap/packages/redis/redis-check-aof appendonly.aof –fix
4. SSH out of the service instance and restart it by running:
bosh restart INSTANCE-GROUP/INSTANCE-ID
2. Saving Error
Symptom: One of the following error messages is logged in syslog:
Background saving error Failed opening the RDB file dump.rdb (in server root dir /var/vcap/store/redis) for saving: No space left on device
Cause: This might be logged when the configured disk size is too small, or if the Redis AOF uses all the disk space.
Solution:
1. Ensure the disk is configured to at least 2.5x the VM memory for the on-demand broker and 3.5x the VM memory for cf-redis-broker.
2. Check if the AOF is using too much disk space by doing the following:
a. BOSH SSH into the affected service instance VM. b. Run cd /var/vcap/store/redis; ls -la to list the size of each file.
3. Failed Backup
Failed backup can be one of the specific errors to check for while troubleshooting Redis for Pivotal Platform.
Symptom: The following error message is logged:
Backup has failed. Redis must be running for a backup to run
Cause: This is logged if a backup is initiated against a Redis server that is down.
Solution:
We need to ensure that the Redis server running during the backup. For this, we need to run a bosh restart against the affected service instance VM.
4. Orphaned Instances: BOSH Director Cannot See Your Instances
Symptom: BOSH Director cannot our your instances, but they are visible on Pivotal Platform.
When we run cf curl /v2/service_instances, some service instances which are visible are not visible to the BOSH Director.
These orphaned instances can create issues. For example, they might hold on to a static IP address, causing IP conflicts.
Cause: Orphaned instances can occur in the following situations:
a. Both Pivotal Platform and BOSH maintain state. Orphaned instances can occur if the Pivotal Platform state is out of sync with BOSH.
b. If a call to de-provision a service instance was made directly to BOSH rather than through the cf CLI.
Solution:
We can solve this issue by doing one of the following:
- If this is the first occurrence: Pivotal recommends that we purge instances by running cf purge-service-instance SERVICE-INSTANCE.
- If this is a repeated occurrence: Contact Pivotal Support for further help, and include a snippet of your broker.log around the time of the incident.
5. Orphaned Instances: Pivotal Platform Cannot See Your Instances
Symptom: Pivotal Platform cannot see our broker or service instances. These instances exist but Pivotal Platform and apps cannot communicate with them.
Cause: If we run cf purge-service-instances while our service instance or broker still exists, our service instance becomes orphaned.
Solution:
If Pivotal Platform lost the details of our instances, but BOSH still has the deployment details, we can solve this issue by backing up the data on our service instance and creating a new service.
Steps to back up the data and create a new service instance:
1. Retrieve the orphaned service instance GUID by running:
bosh -d MY-DEPLOYMENT run-errand orphan-deployments
(Where MY-DEPLOYMENT is the name of the deployment.)
2. SSH into the orphaned service instance by running:
bosh -e MY-ENV -d MY-DEPLOYMENT ssh VM-NAME/GUID
(Where MY-ENV is the name of our environment, MY-DEPLOYMENT is the name of our deployment and VM-NAME/GUID is the name of our service instance and guid that we got in step 1.)
3. Create a new RDB file by running:
/var/vcap/jobs/redis-backups/bin/backup –snapshot
This creates a new RDB file in /var/vcap/store/redis-backup.
4. Push the RDB file to the backup location by running:
/var/vcap/jobs/service-backup/bin/manual-backup
5. Create a new service instance with the same configuration of the database we backed up.
6. Retrieve your new service instance GUID, by running:
bosh -e MY-ENV -d MY-DEPLOYMENT vms
7. SSH into the new service instance by repeating step 2 above with the GUID that we retrieved in step 6.
8. Create a new directory in a new service instance by running:
mkdir /var/vcap/store/MY-BACKUPS
9. Save the RDB file in /var/vcap/store/MY-BACKUPS/ to transfer it to the new instance.
10. Verify the RDB file has not been corrupted by running:
md5sum RDB-FILE
11. Restore your data by running:
sudo /var/vcap/jobs/redis-backups/bin/restore –sourceRDB RDB-FILE
(Where RDB-FILE is the path to our RDB file.)
[Still facing the issue? We are here for you!]
6. Failed to Set Credentials in Runtime CredHub
Symptom: If developers report errors such as:
error: failed to set credentials in credential store: The request includes an unrecognized parameter ‘mode’. Please update or remove this parameter and retry your request.. error for user: There was a problem completing your request. Please contact your operations team providing the following information: service: p.redis, service-instance-guid: , broker-request-id: , operation: bind
Cause: If the service instances are not running the latest version of Redis for Pivotal Platform. We will experience compatibility issues with CredHub if the service instances are running Redis for Pivotal Cloud Foundry v1.14.3 or earlier.
Solution:
1. We need to ensure that we have the latest patch version of Redis for Pivotal Platform installed.
2. Run the upgrade-all-service-instances errand to ensure all service instances are running the latest service offering.
7. Service Outage after Disabling TLS
Symptom: After disabling TLS, apps that require on-demand Redis service instances become unresponsive.
Cause: When TLS is first enabled, all on-demand service instances are re-created with two ports. Every new or re-created app receives the new credentials.
Spring and Steeltoe apps are configured for enabled TLS by default, but other languages and frameworks require further configuration.
When TLS is disabled, the TLS port is removed from all on-demand instances. This prevents the apps from connecting to the instance.
Solution:
First, we have to enable TLS. The compliance body that oversees the apps will require TLS to be enabled.
We should keep in mind that switching between enabled and disabled TLS incurs downtime.
Steps to enable TLS:
1. Go to Ops Manager home page, select the Redis tile.
2. Navigate to On-Demand Service Settings.
3. On the Enable TLS section, ensure it is set to Optional.
4. Click Save.
5. Navigate back to the Ops Manager home page and click Review Pending Changes.
6. Ensure the Recreate All On-Demand Service Instances errand is enabled under the Redis section and then click Apply Changes.
Steps to continue with TLS disabled:
1. Unbind, bind, and re-stage every app that was affected by disabling TLS. This makes Spring and Steeltoe apps default to non-TLS configuration.
2. Manually configure any other relevant languages and frameworks to work with TLS disabled.
[Need assistance? We are here for you!]
Conclusion
In short, we’ve discussed how to troubleshoot Redis for Pivotal Platform. Also, we saw how our Support Engineers check for specific errors while troubleshooting Redis for Pivotal Platform.
0 Comments