How to Write and Delete batch items in DynamoDb using Python

gcptutorials.com Python

By using DynamoDB.Table.batch_writer() you can speed up the process and reduce the number of write requests made to the service. batch_writer() method returns a handle to a batch writer object that will automatically handle buffering and sending items in batches. In addition, the batch writer will also automatically handle any unprocessed items and resend them as needed.

Let's have a look to following code snippets to use DynamoDB.Table.batch_writer().

Prerequisite

Before starting this tutorial, follow the steps in How to create a DynamoDB Table.

If you have followed steps mentioned in How to create a DynamoDB Table, you should be having a dynamoDB with following attributes.

Table Name: sample-movie-table-resource
Partition Key: year
Sort Key: title

Download Data

Now, let's download a sample JSON file containing movies data from here moviedata.zip

Extract Data

Create a new folder with the name tutorial and extract the zip file and place the moviedata.json in the newly created folder and open the folder in vscode.

   
    PS C:\Users\welcome\Downloads> Expand-Archive moviedata.zip
    PS C:\Users\welcome\Downloads> copy .\moviedata\moviedata.json .\tutorial\
    PS C:\Users\welcome\Downloads> cd .\tutorial\
    PS C:\Users\welcome\Downloads\tutorial> code .
    PS C:\Users\welcome\Downloads\tutorial>

Item Attributes

Let's have a look at a sample item from the moviedata.json, it has year as an integer attribute, title as a string attribute which are the HASH and RANGE key for the table respectively.

   
  {
    "year": 2013,
    "title": "We're the Millers",
    "info": {
        "directors": ["Rawson Marshall Thurber"],
        "release_date": "2013-08-03T00:00:00Z",
        "rating": 7.2,
        "genres": [
            "Comedy",
            "Crime"
        ],
        "image_url": "http://ia.media-imdb.com/images/M/MV5BMjA5Njc0NDUxNV5BMl5BanBnXkFtZTcwMjYzNzU1OQ@@._V1_SX400_.jpg",
        "plot": "A veteran pot dealer creates a fake family as part of his plan to move a huge shipment of weed into the U.S. from Mexico.",
        "rank": 13,
        "running_time_secs": 6600,
        "actors": [
            "Jason Sudeikis",
            "Jennifer Aniston",
            "Emma Roberts"
        ]
    }
  }

Dynamodb `batch_writer`

batch_writer creates a batch writer object. The batch_writer writer automatically handles buffering and sending items in bathes. batch_writer also automatically handles any unprocessed items and resend them for processing.

DynamoDB `batch_writer()` to put items

Create a new file demo.py inside the tutorial directory and copy the below code snippet.

   
  import json
  from decimal import Decimal
  
  import boto3
  
  
  dynamodb_resource = boto3.resource("dynamodb")
  table_name = "sample-movie-table-resource"
  file_path = "moviedata.json"
  table = dynamodb_resource.Table(table_name)
  
  
  def read_json_data(file_path):
      movies_data = []
      with open(file_path) as f:
          movies_data = json.loads(f.read())
          print(type(movies_data))
          print(len(movies_data))
      return movies_data[:100]
  
  
  def write_in_batches(batch_items):    
      with table.batch_writer() as batch:
          for item in batch_items:
              item = json.loads(json.dumps(item), parse_float=Decimal)
              batch.put_item(Item=item)
            
      
  
  if __name__ == "__main__":
      movies_data = read_json_data(file_path=file_path)
      write_in_batches(batch_items=movies_data)

In the above code snippet read_json_data function reads data from sample file and returns only the first 100 items for demo.

write_in_batches function than writes the batches of items in dynamoDB tables using table.batch_writer which takes care of buffering and processing of unprocessed items.

DynamoDB batch_writer() to delete items

Create a new file delete_demo.py inside the tutorial directory and copy the below code snippet.

   
  import json
  from decimal import Decimal
  
  import boto3
  
  
  dynamodb_resource = boto3.resource("dynamodb")
  table_name = "sample-movie-table-resource"
  file_path = "moviedata.json"
  table = dynamodb_resource.Table(table_name)
  
  
  def read_json_data(file_path):
      movies_data = []
      with open(file_path) as f:
          movies_data = json.loads(f.read())
          print(type(movies_data))
          print(len(movies_data))
      return movies_data[:100]
  
  
  def write_in_batches(batch_items):
      with table.batch_writer() as batch:
          for item in batch_items:
              item = json.loads(json.dumps(item), parse_float=Decimal)
              batch.put_item(Item=item)
  
  
  def delete_in_batches(batch_items):
      batch_keys = [
          {"year": item["year"], "title": item["title"]} for item in batch_items
      ]
      with table.batch_writer() as batch:
          for key in batch_keys:            
              batch.delete_item(Key=key)
  
  
  if __name__ == "__main__":
      movies_data = read_json_data(file_path=file_path)
      #write_in_batches(batch_items=movies_data)
      delete_in_batches(movies_data)

In the above code snippet read_json_data function reads data from sample file and returns only the first 100 items for demo.

delete_in_batches function than retrieves the keys from batches of items and deletes the items using batch.delete_item in table.batch_writer which takes care of buffering and processing of unprocessed items.

Category: Python