Bobcares

Troubleshoot ML Workspaces on AWS | Know More

by | Dec 9, 2022

Let’s see more about the troubleshooting of ML Workspaces on AWS. At Bobcares, with our AWS Support Services, we can handle your AWS-related issues.

How to troubleshoot ML Workspaces on AWS?

The ML workspaces are quick and easy to install, and they get users up and running in no time. It offers a complete web-based IDE that is optimized for data science and machine learning. The following tasks are carried out by Continuous Machine Learning (CML) when we create a new ML Workspace on AWS:

  • Reaches out to the CDP Management Console to verify the AWS login information. Additionally, Single Sign-On will be enabled, enabling authorized CDP users to sign in automatically to the new workspace.
  • For the workspace on the cloud service provider, create an NFS filesystem. CML will set up storage on EFS on AWS.
  • Creates a Kubernetes cluster on the cloud hosting company’s platform. The workspace infrastructure and computing resources are managed by this cluster. CML sets up an EKS cluster on AWS.
  • Attaches the NFS filesystem to the Kubernetes cluster.
  • Uses LetsEncrypt to provide TLS certificates for the workspace.
  • Registers the workspace with the DNS service of the cloud provider. This is Route53 on AWS.
  • On the EKS cluster, installs Cloudera Machine Learning.
Troubleshoot ML Workspaces on AWS

In each of the above steps, there are chances for errors to happen. In such a case, we need to have access to one or more of the below resources. After having the needed details, we can easily fix the issue later.

Knowing The Resources

Workspace >> Details Page

There is a Details page for every workspace that contains essential details about the workspace. Log into CDP, go to ML Workspaces, and click on the workspace name to reach this page.

Basic workspace details like who built it and when are listed on this page. A list of tags related to the workspace is also provided, along with links to the environment where the workspace was established, the underlying EKS cluster on AWS, the computational resources being used, and more.

Workspace >> Events Page

Additionally, every workspace has an associated Events page that records each action taken on the workspace. Among other things, this entails creating, upgrading, and uninstalling the workspace. Sign in to CDP, navigate to ML Workspaces, click the workspace name, and then click Events to access this page. To obtain a high-level breakdown of all the steps taken by CML to execute an action, click the View Logs button connected to the activity.

When an activity fails, the Request ID assigned to it is very helpful since it enables our Support team to quickly identify the set of actions that caused the failure.

Environment >> Summary Page

The environment in which CML workspaces are provisioned has a significant impact on their performance. The Summary page for each environment contains helpful data that can be used to troubleshoot CML service problems. The workspace Details page offers immediate access to the environment.

The Summary page also contains details about Credential Setup, Region, Network, and Logs.

AWS Management Console

We can conduct further research by going to the AWS console (for the region where your environment was established) if we have all the necessary details about the environment and the workspace. All of the dashboards that CML uses are accessible from the AWS Management Console.

The EC2 service dashboards allow us to view the instance type.

We can get further details from EKS, like the version of Kubernetes CML is using, network details, and cluster status.

When looking for the appropriate VPC where you have provisioned or are attempting to provision an ML workspace, use the VPC ID found on the CDP environment Summary page.

For further debugging, we can check/download logs using the S3 bucket that has been established for the environment.

We will have the choice to provide the workspace with one or more tags when creating an ML workspace. All of the underlying AWS resources used by the workspace are then given these tags. Whether provisioning or de-provisioning fails, it can be quite helpful to quickly check the workspace’s tags to determine if any resources need to be manually cleaned up.

We can use the Trusted Advisor (available with AWS Support). It offers a high-level view of how we are doing with the AWS account.

[Need assistance with another issue? We are just a click away.]

Conclusion

The article provides a brief description of various resources which we must deal with during the troubleshooting of ML workspaces on AWS to make the process simpler.

PREVENT YOUR SERVER FROM CRASHING!

Never again lose customers to poor server speed! Let us help you.

Our server experts will monitor & maintain your server 24/7 so that it remains lightning fast and secure.

GET STARTED

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Never again lose customers to poor
server speed! Let us help you.