Deploying Custom Models on Amazon Bedrock: A Hands-On Tutorial

gcptutorials.com AWS

Amazon Bedrock supports custom model deployment, enabling businesses to combine proprietary models with AWS-managed foundation models. This tutorial guides beginners through deploying a custom Flan-T5 model end-to-end, including model preparation, safety configurations, and production monitoring.

Step 1: Prepare Your AWS Environment

Objective: Configure Bedrock model deployment permissions

Enable Bedrock model deployment in IAM:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateModel",
                "bedrock:CreateEndpoint",
                "s3:PutObject"
            ],
            "Resource": "*"
        }
    ]
}

Validate access using AWS CLI:

aws bedrock list-custom-models --region us-west-2

Step 2: Package Your Custom Model

Requirements: Model artifacts in TorchScript or TensorFlow SavedModel format

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

model_name = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Save for Bedrock deployment
torch.jit.save(torch.jit.trace(model, ["input_ids"]), "model.pt")

Create Model Card

model_card = {
    "modelName": "flan-t5-custom",
    "description": "Custom fine-tuned Flan-T5 for customer support",
    "inferenceSpec": {
        "containerImage": "123456789012.dkr.ecr.us-west-2.amazonaws.com/bedrock-custom",
        "supportedContentTypes": ["application/json"]
    }
}

Step 3: Upload to Amazon S3

Best Practice: Use versioned buckets for model artifacts

aws s3 cp model.pt s3://your-bucket/models/v1.0.0/model.pt
aws s3 cp model_card.json s3://your-bucket/models/v1.0.0/

Verify Upload

aws s3 ls s3://your-bucket/models/v1.0.0/ --recursive

Step 4: Deploy via Bedrock API

aws bedrock create-custom-model \
--model-name "flan-t5-custom" \
--job-name "flan-deployment-job" \
--role-arn "arn:aws:iam::123456789012:role/BedrockModelRole" \
--base-model-identifier "arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-text-express-v1" \
--training-data-config "file://model_card.json" \
--output-config '{"s3OutputLocation": "s3://your-bucket/outputs/"}' \
--region us-west-2

Monitor Deployment

aws bedrock get-model-deployment-status \
--model-id "arn:aws:bedrock:us-west-2:123456789012:custom-model/flan-t5-custom" \
--region us-west-2

Step 5: Create Inference Endpoint

aws bedrock create-endpoint \
--endpoint-name "flan-t5-endpoint" \
--model-arn "arn:aws:bedrock:us-west-2:123456789012:custom-model/flan-t5-custom" \
--endpoint-config '{"instanceType": "ml.g5.xlarge", "instanceCount": 1}' \
--region us-west-2

Python Inference Example

import boto3

bedrock = boto3.client('bedrock-runtime', region_name='us-west-2')

response = bedrock.invoke_model(
    modelId="flan-t5-custom",
    body=json.dumps({
        "text_inputs": "Translate to French: Hello world",
        "max_length": 50
    })
)

print(json.loads(response['body'].read())['generated_text'])

Step 6: Production Monitoring

CloudWatch Metrics Setup

aws cloudwatch put-metric-alarm \
--alarm-name "Bedrock-High-Latency" \
--metric-name "ModelLatency" \
--namespace "AWS/Bedrock" \
--statistic "Average" \
--period 300 \
--threshold 1000 \
--comparison-operator "GreaterThanThreshold" \
--evaluation-periods 1 \
--alarm-actions "arn:aws:sns:us-west-2:123456789012:BedrockAlerts"

Cost Tracking

aws cloudwatch get-metric-statistics \
--namespace "AWS/Bedrock" \
--metric-name "Invocations" \
--start-time $(date -u +"%Y-%m-%dT%H:%M:%SZ" -d "-7 days") \
--end-time $(date -u +"%Y-%m-%dT%H:%M:%SZ") \
--period 3600 \
--statistics "Sum"

Step 7: Update Deployed Models

aws bedrock update-endpoint \
--endpoint-name "flan-t5-endpoint" \
--model-arn "arn:aws:bedrock:us-west-2:123456789012:custom-model/flan-t5-custom-v2" \
--region us-west-2 \
--deployment-config '{"blueGreenUpdate": {"trafficRoutingConfiguration": {"type": "ALL_AT_ONCE"}}}'