Building a model using SageMaker
Sagemaker is a combination of two very useful tools:
- Managed jupyter notebook instance
- The second tool provided by Sagemaker is an API, which simplifies training and deploying a ML model.
So, why use SageMaker?: It makes a lot of the Machine Learning tasks we need to perform much easier. Consider the ML workflow that I discussed in detail in my earlier post.
SageMaker provides tools to make each of the steps involved simpler. The notebook can be used to explore and process the data, and the API can help simplify the modeling and deployment steps.
So, how does SageMaker actually work?: For the most part, when we talk about using SageMaker, we really mean working in the managed notebook. This notebook has all the benefits that we talked about earlier, with the added benefit of having access to the SageMaker API. The SageMaker API itself can be thought of as a collection of tools that deal with the training process and the inference process.
Training Process: The training process is exactly what you think it is:
- First a computational task is constructed. Generally this task is meant to fit a Machine Learning model to some data.
- Then this task is executed on a Virtual Machine. The resulting model, such as the tree constructed in a Random Tree model or layers of a neural network, is then saved to a file. This saved data is called the model artifacts.
Inference Process: The inference process is very similar to the training process:
- First a computational task is constructured for the purposes of performing an inference.
- Then this task is executed on a Virtual Machine. In this case however, the VM waits for us to send it some data. When we do, it takes that data along with the model artifacts - which are created during the training process, and performs inference, returning the result.
Setting up the notebook instance
The first thing we are going to need to do is set up a notebook instance!
This will be the primary way in which we interact with the SageMaker ecosystem. Of course, this is not the only way to interact with SageMaker’s functionality, but it is the way that we will use for now.
Note: Once a notebook instance has been set up, by default, it will be
InServicewhich means that the notebook instance is running. This is important to know because the cost of a notebook instance is based on the length of time that it has been running. This means that once you are finished using a notebook instance you should
Stopit so that you are no longer incurring a cost. Don’t worry though, you won’t lose any data provided you don’t delete the instance. Just start the instance back up when you have time and all of your saved data will still be there.
Create a new notebook instance, give it a name. We also need to make sure we set a role. A role acts as a security certificate, letting Amazon know what other resources our notebook will have access to. We need to make sure that we allow our notebook to access S3. To do that, create a new IAM role. The default selections will be fine for us.
Note: Selecting None here, since we only want buckets which have name sagemaker in them to be accessible from our notebook. If you want any other specific bucket, then select the first option and give the bucket name.
Now that your notebook instance has been set up and is running, it’s time to get the notebooks that we will be using from here:
sh-4.2$ pwd /home/ec2-user sh-4.2$ ls anaconda3 examples LICENSE Nvidia_Cloud_EULA.pdf README SageMaker sample-notebooks sample-notebooks-1606850748 src tools tutorials sh-4.2$ cd SageMaker/ sh-4.2$ ls lost+found sh-4.2$ git clone https://github.com/udacity/sagemaker-deployment.git Cloning into 'sagemaker-deployment'... remote: Enumerating objects: 10, done. remote: Counting objects: 100% (10/10), done. remote: Compressing objects: 100% (9/9), done. remote: Total 259 (delta 3), reused 4 (delta 1), pack-reused 249 Receiving objects: 100% (259/259), 258.78 KiB | 23.52 MiB/s, done. Resolving deltas: 100% (156/156), done. sh-4.2$ ls lost+found sagemaker-deployment sh-4.2$ cd sagemaker-deployment/ sh-4.2$ ls LICENSE Mini-Projects Project README.md Tutorials sh-4.2$
Boston Housing Example
As our first example of using SageMaker, we are going to take a look at the Boston Housing dataset, and we are going to use that dataset to predict the median cost of a house in the Boston area. Refer to this notebook
SageMaker Sessions & Execution Roles
SageMaker has some unique objects and terminology that will become more familiar over time. There are a few objects that you’ll see come up, over and over again:
Session - A session is a special object that allows you to do things like manage data in S3 and create and train any machine learning models. The
upload_datafunction should be close to the top of the list! You’ll also see functions like
create_modelall of which we will use regularly.
Role - Sometimes called the execution role, this is the IAM role that you created when you created your notebook instance. The role basically defines how data that your notebook uses/creates will be stored. You can even try printing out the role with print(role) to see the details of this creation.
What is AWS SageMaker?:
AWS (or Amazon) SageMaker is a fully managed service that provides the ability to build, train, tune, deploy, and manage large-scale machine learning (ML) models quickly. Sagemaker provides tools to make each of the following steps simpler:
- Explore and process data
- Clean and explore
- Prepare and transform
- Develop and train the model
- Validate and evaluate the model
- Deploy to production
- Monitor, and update model & data
What tools are provided in Sagemaker?: The Amazon Sagemaker provides the following tools:
- Ground Truth - To label the jobs, datasets, and workforces
- Notebook - To create Jupyter notebook instances, configure the lifecycle of the notebooks, and attache Git repositories
- Training - To choose an ML algorithm, define the training jobs, and tune the hyperparameter
- Inference - To compile and configure the trained models, and endpoints for deployments
SageMaker Instance types start with
SageMaker instances are the dedicated VMs that are optimized to fit different machine learning (ML) use cases. The supported instance types, names, and pricing in SageMaker are different than that of EC2. Refer the following links to have better insight: