aws sagemaker tutorial

Step 4.1: Download the MNIST Dataset

  1. Downloads the MNIST dataset (mnist.pkl.gz) from the MNIST Database website to your notebook.
  2. Unzips the file and reads the following datasets into the notebook's memory:

    • train_set – You use these images of handwritten numbers to train a model.
    • valid_set – The XGBoost Algorithm uses these images to evaluate the progress of the model during training.
    • test_set – You use this set to get inferences to test the deployed model.
In [1]:
%%time 
import pickle, gzip, urllib.request, json
import numpy as np

# Load the dataset
urllib.request.urlretrieve("http://deeplearning.net/data/mnist/mnist.pkl.gz", "mnist.pkl.gz")
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
print(train_set[0].shape)
(50000, 784)
CPU times: user 974 ms, sys: 296 ms, total: 1.27 s
Wall time: 14.1 s

Step 4.2: Explore the Training Dataset

train_set contains the following structures:

  • train_set[0] – Contains images.
  • train_set[1] – Contains labels.

The code uses the matplotlib library to get and display the first 10 images from the training dataset.

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (2,10)

for i in range(0, 10):
    img = train_set[0][i]
    label = train_set[1][i]
    img_reshape = img.reshape((28,28))
    imgplot = plt.imshow(img_reshape, cmap='gray')
    print('This is a {}'.format(label))
    plt.show()
This is a 5
This is a 0
This is a 4
This is a 1
This is a 9
This is a 2
This is a 1
This is a 3
This is a 1
This is a 4

Step 4.3: Transform the Training Dataset and Upload It to Amazon S3

The XGBoost Algorithm expects comma-separated values (CSV) for its training input. The format of the training dataset is numpy.array. Transform the dataset from numpy.array format to the CSV format

In [3]:
%%time

import os
import boto3
import re
import copy
import time
import io
import struct
from time import gmtime, strftime
from sagemaker import get_execution_role

role = get_execution_role()

region = boto3.Session().region_name

bucket='sagemaker-cn7030' # Replace with your s3 bucket name
prefix = 'sagemaker/xgboost-mnist' # Used as part of the path in the bucket where you store data

def convert_data():
    data_partitions = [('train', train_set), ('validation', valid_set), ('test', test_set)]
    for data_partition_name, data_partition in data_partitions:
        print('{}: {} {}'.format(data_partition_name, data_partition[0].shape, data_partition[1].shape))
        labels = [t.tolist() for t in data_partition[1]]
        features = [t.tolist() for t in data_partition[0]]
        
        if data_partition_name != 'test':
            examples = np.insert(features, 0, labels, axis=1)
        else:
            examples = features
        #print(examples[50000,:])
        
        
        np.savetxt('data.csv', examples, delimiter=',')
        
        
        
        key = "{}/{}/examples".format(prefix,data_partition_name)
        url = 's3://{}/{}'.format(bucket, key)
        boto3.Session().resource('s3').Bucket(bucket).Object(key).upload_file('data.csv')
        print('Done writing to {}'.format(url))
        
convert_data()
train: (50000, 784) (50000,)
Done writing to s3://sagemaker-cn7030/sagemaker/xgboost-mnist/train/examples
validation: (10000, 784) (10000,)
Done writing to s3://sagemaker-cn7030/sagemaker/xgboost-mnist/validation/examples
test: (10000, 784) (10000,)
Done writing to s3://sagemaker-cn7030/sagemaker/xgboost-mnist/test/examples
CPU times: user 37.6 s, sys: 5.84 s, total: 43.5 s
Wall time: 58.6 s

Step 5: Train the model

Create and Run a Training Job (Amazon SageMaker Python SDK)

The Amazon SageMaker Python SDK includes the sagemaker.estimator.Estimator estimator. You can use this class, in the sagemaker.estimator module, with any algorithm. For more information, see https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.

To run a model training job (Amazon SageMaker Python SDK)

1.Import the Amazon SageMaker Python SDK and get the XGBoost container.

In [4]:
import sagemaker

from sagemaker.amazon.amazon_estimator import get_image_uri

container = get_image_uri(boto3.Session().region_name, 'xgboost', '0.90-1')

2.Download the training and validation data from the Amazon S3 location where you uploaded it in Step 4.3: Transform the Training Dataset and Upload It to Amazon S3, and set the location where you store the training output.

In [5]:
train_data = 's3://{}/{}/{}'.format(bucket, prefix, 'train')

validation_data = 's3://{}/{}/{}'.format(bucket, prefix, 'validation')

s3_output_location = 's3://{}/{}/{}'.format(bucket, prefix, 'xgboost_model_sdk')
print(train_data)
s3://sagemaker-cn7030/sagemaker/xgboost-mnist/train

3.Create an instance of the sagemaker.estimator.Estimator class.

In [6]:
xgb_model = sagemaker.estimator.Estimator(container,
                                         role, 
                                         train_instance_count=1, 
                                         train_instance_type='ml.m4.xlarge',
                                         train_volume_size = 5,
                                         output_path=s3_output_location,
                                         sagemaker_session=sagemaker.Session())

In the constructor, you specify the following parameters:

  • role – The AWS Identity and Access Management (IAM) role that Amazon SageMaker can assume to perform tasks on your behalf (for example, reading training results, called model artifacts, from the S3 bucket and writing training results to Amazon S3). This is the role that you got in Step 3: Create a Jupyter Notebook.

  • train_instance_count and train_instance_type – The type and number of ML compute instances to use for model training. For this exercise, you use only a single training instance.

  • train_volume_size – The size, in GB, of the Amazon Elastic Block Store (Amazon EBS) storage volume to attach to the training instance. This must be large enough to store training data if you use File mode (File mode is the default).

  • output_path – The path to the S3 bucket where Amazon SageMaker stores the training results.

  • sagemaker_session – The session object that manages interactions with Amazon SageMaker APIs and any other AWS service that the training job uses.

4.Set the hyperparameter values for the XGBoost training job by calling the set_hyperparameters method of the estimator. For a description of XGBoost hyperparameters, see XGBoost Hyperparameters.

In [7]:
xgb_model.set_hyperparameters(max_depth = 5,
                              eta = .2,
                              gamma = 4,
                              min_child_weight = 6,
                              silent = 0,
                              objective = "multi:softmax",
                              num_class = 10,
                              num_round = 10)

5.Create the training channels to use for the training job. For this example, we use both train and validation channels.

In [8]:
train_channel = sagemaker.session.s3_input(train_data, content_type='text/csv')
valid_channel = sagemaker.session.s3_input(validation_data, content_type='text/csv')

data_channels = {'train': train_channel, 'validation': valid_channel}

6.To start model training, call the estimator's fit method

In [9]:
xgb_model.fit(inputs=data_channels,  logs=True)
2020-04-12 08:30:41 Starting - Starting the training job...
2020-04-12 08:30:43 Starting - Launching requested ML instances...
2020-04-12 08:31:41 Starting - Preparing the instances for training......
2020-04-12 08:32:36 Downloading - Downloading input data......
2020-04-12 08:33:38 Training - Training image download completed. Training in progress...INFO:sagemaker-containers:Imported framework sagemaker_xgboost_container.training
INFO:sagemaker-containers:Failed to parse hyperparameter objective value multi:softmax to Json.
Returning the value itself
INFO:sagemaker-containers:No GPUs detected (normal if no gpus installed)
INFO:sagemaker_xgboost_container.training:Running XGBoost Sagemaker in algorithm mode
INFO:root:Determined delimiter of CSV input is ','
INFO:root:Determined delimiter of CSV input is ','
INFO:root:Determined delimiter of CSV input is ','
[08:33:47] 50000x784 matrix with 39200000 entries loaded from /opt/ml/input/data/train?format=csv&label_column=0&delimiter=,
INFO:root:Determined delimiter of CSV input is ','
[08:33:49] 10000x784 matrix with 7840000 entries loaded from /opt/ml/input/data/validation?format=csv&label_column=0&delimiter=,
INFO:root:Single node training.
INFO:root:Train matrix has 50000 rows
INFO:root:Validation matrix has 10000 rows
[0]#011train-merror:0.17074#011validation-merror:0.1664
[1]#011train-merror:0.12624#011validation-merror:0.1273
[2]#011train-merror:0.11272#011validation-merror:0.1143
[3]#011train-merror:0.10072#011validation-merror:0.1052
[4]#011train-merror:0.09216#011validation-merror:0.097
[5]#011train-merror:0.08544#011validation-merror:0.0904
[6]#011train-merror:0.08064#011validation-merror:0.0864
[7]#011train-merror:0.0769#011validation-merror:0.0821
[8]#011train-merror:0.0731#011validation-merror:0.0809
[9]#011train-merror:0.06942#011validation-merror:0.0773

2020-04-12 08:35:02 Uploading - Uploading generated training model
2020-04-12 08:35:02 Completed - Training job completed
Training seconds: 146
Billable seconds: 146

This is a synchronous operation. The method displays progress logs and waits until training completes before returning. For more information about model training, see Train a Model with Amazon SageMaker.

Step 6: Deploy the Model to Amazon SageMaker

To get predictions, deploy your model. The method you use depends on how you want to generate inferences:

  • To get one inference at a time in real time, set up a persistent endpoint using Amazon SageMaker hosting services.
  • To get inferences for an entire dataset, use Amazon SageMaker batch transform.

6.1 Deploy the Model to Amazon SageMaker Hosting Services

Deploy the model that you trained in Create and Run a Training Job (Amazon SageMaker Python SDK) by calling the deploy method of the sagemaker.estimator.Estimator object. This is the same object that you used to train the model. When you call the deploy method, specify the number and type of ML instances that you want to use to host the endpoint.

In [10]:
xgb_predictor = xgb_model.deploy(initial_instance_count=1,
                                content_type='text/csv',
                                instance_type='ml.t2.medium'
                                )
-----------------!

The deploy method creates the deployable model, configures the Amazon SageMaker hosting services endpoint, and launches the endpoint to host the model. For more information, see https://sagemaker.readthedocs.io/en/stable/estimators.html#sagemaker.estimator.Estimator.deploy.

It also returns a sagemaker.predictor.RealTimePredictor object, which you can use to get inferences from the model. For information, see https://sagemaker.readthedocs.io/en/stable/predictors.html#sagemaker.predictor.RealTimePredictor.

Step 7: Validate the Model

7.1 Validate a Model Deployed to Amazon SageMaker Hosting Services

If you deployed a model to Amazon SageMaker hosting services in Step 6.1: Deploy the Model to Amazon SageMaker Hosting Services, you now have an endpoint that you can invoke to get inferences in real time. To validate the model, invoke the endpoint with example images from the test dataset and check whether the inferences you get match the actual labels of the images.

Validate a Model Deployed to Amazon SageMaker Hosting Services (Amazon SageMaker Python SDK)

To validate the model by using the Amazon SageMaker Python SDK, use the sagemaker.predictor.RealTimePredictor object that you created in Deploy the Model to Amazon SageMaker Hosting Services (Amazon SageMaker Python SDK). For information, see https://sagemaker.readthedocs.io/en/stable/predictors.html#sagemaker.predictor.RealTimePredictor.

To validate the model (Amazon SageMaker Python SDK)

1.Download the test data from Amazon S3.

In [11]:
s3 = boto3.resource('s3')

test_key = "{}/test/examples".format(prefix)

s3.Bucket(bucket).download_file(test_key, 'test_data')

2.Plot the first 10 images from the test dataset with their labels.

In [12]:
%matplotlib inline
                        
for i in range (0, 10):
    img = test_set[0][i]
    label = test_set[1][i]
    img_reshape = img.reshape((28,28))
    imgplot = plt.imshow(img_reshape, cmap='gray')
    print('This is a {}'.format(label))
    plt.show()
This is a 7
This is a 2
This is a 1
This is a 0
This is a 4
This is a 1
This is a 4
This is a 9
This is a 5
This is a 9

3.To get inferences for the first 10 examples in the test dataset, call the predict method of the sagemaker.predictor.RealTimePredictor object.

In [13]:
with open('test_data', 'r') as f:
    for j in range(0,10):
        single_test = f.readline()
        result = xgb_predictor.predict(single_test)
        print(result)
b'7.0'
b'2.0'
b'1.0'
b'0.0'
b'4.0'
b'1.0'
b'4.0'
b'9.0'
b'5.0'
b'9.0'

To see if the model is making accurate predictions, check the output from this step against the numbers that you plotted in the previous step.

You have now trained, deployed, and validated your first model in Amazon SageMaker.