AWS SageMaker is a fully managed machine learning service by Amazon Web Services (AWS) that simplifies building, training, and deploying ML models. Whether you're a data scientist or a beginner, SageMaker handles infrastructure, scaling, and optimization so you can focus on your model. In this guide, we’ll walk through setting up your first ML project using SageMaker, complete with code examples and clear explanations.
1. Log into your AWS Console and navigate to SageMaker.
2. Click Create notebook instance.
3. Name your instance (e.g.,
MyFirstSageMakerInstance
).
4. Under Permissions, create a new IAM role (AWS will
auto-configure permissions).
5. Choose ml.t2.medium
as the instance type (cost-effective for
beginners).
6. Click Create.
# Sample code to create a notebook instance via AWS SDK (boto3)
import boto3
sagemaker = boto3.client('sagemaker', region_name='us-east-1')
response = sagemaker.create_notebook_instance(
NotebookInstanceName='MyFirstSageMakerInstance',
InstanceType='ml.t2.medium',
RoleArn='arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole'
)
Explanation: This code uses AWS’s Boto3 library to create a
notebook instance programmatically. Replace RoleArn
with your
IAM role ARN.
Upload your dataset to Amazon S3 or use built-in datasets. We’ll use the Iris dataset for this example.
# Load the Iris dataset using Pandas
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target
# Save to CSV and upload to S3
df.to_csv('iris.csv', index=False)
import sagemaker
sess = sagemaker.Session()
bucket = sess.default_bucket()
s3_path = sess.upload_data(path='iris.csv', bucket=bucket, key_prefix='data')
Explanation: This code loads the Iris dataset, converts it to a CSV file, and uploads it to an S3 bucket. SageMaker uses S3 for scalable data storage.
We’ll use SageMaker’s built-in XGBoost algorithm.
from sagemaker.amazon.amazon_estimator import get_image_uri
from sagemaker.estimator import Estimator
# Retrieve the XGBoost container image
container = get_image_uri(sess.boto_region_name, 'xgboost', '1.2-1')
# Configure the training job
estimator = Estimator(
image_uri=container,
role=sagemaker.get_execution_role(),
instance_count=1,
instance_type='ml.m5.large',
output_path=f's3://{bucket}/output/'
)
# Set hyperparameters
estimator.set_hyperparameters(
objective='multi:softmax',
num_class=3,
num_round=50
)
# Start training
estimator.fit({'train': s3_path})
Explanation: This code initializes an XGBoost estimator, sets hyperparameters, and starts the training job. Outputs are saved to S3.
predictor = estimator.deploy(
initial_instance_count=1,
instance_type='ml.t2.medium'
)
# Sample prediction
import numpy as np
sample = np.array([5.1, 3.5, 1.4, 0.2])
result = predictor.predict(sample)
print(f"Predicted class: {result}")
Explanation: The model is deployed as an endpoint for real-time predictions. Always delete endpoints after testing to avoid costs.
# Delete the endpoint
predictor.delete_endpoint()
# Stop the notebook instance
sagemaker.stop_notebook_instance(NotebookInstanceName='MyFirstSageMakerInstance')
Explanation: Always terminate unused resources to avoid unexpected charges.
You’ve just built and deployed your first ML model with AWS SageMaker! From data preparation to deployment, SageMaker abstracts infrastructure complexities, letting you focus on the ML workflow. Practice with larger datasets and explore SageMaker’s advanced features like AutoML and pipelines.
Category: AWS