Trigger Pre-built Framework Training Job via Amazon SageMaker API

TL;DR

The SageMaker training job with customized training script in frameworks such as TensorFlow/PyTorch/scikit-learn can also be triggered by pure SageMaker API, by configuring the request body fields:

  • HyperParameters.sagemaker_submit_directory: the S3 location of the uploaded source.tar.gz file, which tars the training script.
  • HyperParameters.sagemaker_program: the name of the entry point file
  • AlgorithmSpecification.TrainingImage: the Amazon ECR registry path of the pre-built framework container images. You can find the images URL here or here.

Reason for This Blog

Amazon SageMaker Python SDK is a great package for SageMaker practices. Still, in some scenarios it is required to trigger pre-built framework (TensorFlow, Pytorch, scikit-learn, etc) training job via SageMaker API directly. However, there does not seem to have any explicit document/tutorial to describe the solution. So, I write this short article, and hope it can help you. This article can also be found in this my blog.

Image by Pixabay

Running Pre-built Framework Training Job with Amazon SageMaker Python SDK

Amazon SageMaker Python SDK is an open source library for training and deploying machine-learned models on Amazon SageMaker. There are bunch of examples for TensorFlow, PyTorch, scikit-learn and more frameworks in this open source repository amazon-sagemaker-examples. In short, we can create an Estimator with the customized script and fit the estimator as the code piece below. For the parameter of PyTorch estimator, entry_pointindicates the training script, framework_version andpy_verisondecide the pre-built container image.

Example code of starting a SageMaker PyTorch Training job by SageMaker Python SDK. The code is copied from this example in amaozn-sagemaker-example repository

Scenarios of Using SageMaker API Directly

Although SageMaker Python SDK is very convienient to run managed training job for a variety of machine learning frameworks, there are still some scenarios that we need to trigger SageMaker Training job directly via Amazon SageMaker CreateTrainingJob API, such as:

  • Machine Learning engineers or software developers use other languages instead of Python.
  • The machine learning operational pipeline is constructed by AWS Step Functions. The Step Functions SageMaker connection uses SageMaker API interface.

Use SageMaker API to Trigger Pre-build Framework Training Job with Training Scripts

People may think it is not supported by SageMaker API to trigger TensorFlow/PyTorch/… training job with customized training script, because the SageMaker API seems have no place to setup training script location at the first glance. The good news is we can!

First, we need to tar the training script as source.tar.gzand upload to a S3 location, e.g., s3://bucket/prefix/source.tar.gz. This step can be done as a step of the step functions, or in the CI/CD build stage, depends on how we operate the ML pipeline.

Then, we need to set up these fields in SageMaker CreateTrainingJob API request body or the state definition of Step Functions SageMaker connector.

  • HyperParameters.sagemaker_submit_directory: the S3 location of the uploaded source.tar.gz file, e.g., s3://bucket/prefix/source.tar.gz
  • HyperParameters.sagemaker_program: the name of the entry point file
  • AlgorithmSpecification.TrainingImage: the Amazon ECR registry path of the pre-built framework container images. You can find the images URL here.

Below is an example to trigger the same training job by using Python boto3 SDK. For more information about HyperParameters, you can refer to this code in sagemaker-training-toolkit package.

Reference

sun-analytics.nl

Love podcasts or audiobooks? Learn on the go with our new app.

Redis “Namespace”

What is logarithms and where can we use it.

K8s One Machine lab: Build and Run Kubernetes on a Single Machine

Number of Islands [DFS Problems]

Update on the BETA rewards

What is AWS CLI ??

Beginner’s Guide to Jupyter Notebook & JupyterLab

6 Useful Jenkins Integrations to Increase Productivity

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zhe Sun

Zhe Sun

sun-analytics.nl

More from Medium

How to Create a Serverless Headless-Chrome Web Scraper with AWS SAM and Python3.7

Setup AWS Redshift cluster with external connectivity

Scheduling Google Cloud Functions to Run Periodically

6 Tips for Working with AWS Lambda