By using DynamoDB.Table.batch_writer()
you can speed up the
process and reduce the number of write requests made to the service.
batch_writer() method returns a handle to a batch writer object that will
automatically handle buffering and sending items in batches. In addition,
the batch writer will also automatically handle any unprocessed items and
resend them as needed.
Let's have a look to following code snippets to use
DynamoDB.Table.batch_writer()
.
Before starting this tutorial, follow the steps in How to create a DynamoDB Table.
If you have followed steps mentioned in How to create a DynamoDB Table, you should be having a dynamoDB with following attributes.
Now, let's download a sample JSON file containing movies data from here moviedata.zip
Create a new folder with the name tutorial and extract the zip file and place the moviedata.json in the newly created folder and open the folder in vscode.
PS C:\Users\welcome\Downloads> Expand-Archive moviedata.zip
PS C:\Users\welcome\Downloads> copy .\moviedata\moviedata.json .\tutorial\
PS C:\Users\welcome\Downloads> cd .\tutorial\
PS C:\Users\welcome\Downloads\tutorial> code .
PS C:\Users\welcome\Downloads\tutorial>
Let's have a look at a sample item from the moviedata.json, it has year as an integer attribute, title as a string attribute which are the HASH and RANGE key for the table respectively.
{
"year": 2013,
"title": "We're the Millers",
"info": {
"directors": ["Rawson Marshall Thurber"],
"release_date": "2013-08-03T00:00:00Z",
"rating": 7.2,
"genres": [
"Comedy",
"Crime"
],
"image_url": "http://ia.media-imdb.com/images/M/MV5BMjA5Njc0NDUxNV5BMl5BanBnXkFtZTcwMjYzNzU1OQ@@._V1_SX400_.jpg",
"plot": "A veteran pot dealer creates a fake family as part of his plan to move a huge shipment of weed into the U.S. from Mexico.",
"rank": 13,
"running_time_secs": 6600,
"actors": [
"Jason Sudeikis",
"Jennifer Aniston",
"Emma Roberts"
]
}
}
batch_writer
batch_writer
creates a batch writer object. The
batch_writer
writer automatically handles buffering and sending
items in bathes. batch_writer
also automatically handles any
unprocessed items and resend them for processing.
batch_writer()
to put itemsCreate a new file demo.py inside the tutorial directory and copy the below code snippet.
import json
from decimal import Decimal
import boto3
dynamodb_resource = boto3.resource("dynamodb")
table_name = "sample-movie-table-resource"
file_path = "moviedata.json"
table = dynamodb_resource.Table(table_name)
def read_json_data(file_path):
movies_data = []
with open(file_path) as f:
movies_data = json.loads(f.read())
print(type(movies_data))
print(len(movies_data))
return movies_data[:100]
def write_in_batches(batch_items):
with table.batch_writer() as batch:
for item in batch_items:
item = json.loads(json.dumps(item), parse_float=Decimal)
batch.put_item(Item=item)
if __name__ == "__main__":
movies_data = read_json_data(file_path=file_path)
write_in_batches(batch_items=movies_data)
In the above code snippet read_json_data
function reads data
from sample file and returns only the first 100 items for demo.
write_in_batches
function than writes the batches of items in
dynamoDB tables using table.batch_writer
which takes care of
buffering and processing of unprocessed items.
Create a new file delete_demo.py inside the tutorial directory and copy the below code snippet.
import json
from decimal import Decimal
import boto3
dynamodb_resource = boto3.resource("dynamodb")
table_name = "sample-movie-table-resource"
file_path = "moviedata.json"
table = dynamodb_resource.Table(table_name)
def read_json_data(file_path):
movies_data = []
with open(file_path) as f:
movies_data = json.loads(f.read())
print(type(movies_data))
print(len(movies_data))
return movies_data[:100]
def write_in_batches(batch_items):
with table.batch_writer() as batch:
for item in batch_items:
item = json.loads(json.dumps(item), parse_float=Decimal)
batch.put_item(Item=item)
def delete_in_batches(batch_items):
batch_keys = [
{"year": item["year"], "title": item["title"]} for item in batch_items
]
with table.batch_writer() as batch:
for key in batch_keys:
batch.delete_item(Key=key)
if __name__ == "__main__":
movies_data = read_json_data(file_path=file_path)
#write_in_batches(batch_items=movies_data)
delete_in_batches(movies_data)
In the above code snippet read_json_data
function reads data
from sample file and returns only the first 100 items for demo.
delete_in_batches
function than retrieves the keys from batches
of items and deletes the items using batch.delete_item
in
table.batch_writer
which takes care of buffering and processing
of unprocessed items.
Category: Python