20 June 2021 Reporting on AWS

Reading and Writing to DynamoDB with Python

19 June 2021 - 5 min read

Data, such as that generated by The Lambda Hunter needs to be stored somewhere so that it can be reported upon.

This is the role of DynamoDB, a massively scalable NOSQL database. We’ve already seen that metric data can produce very useful graphs using tools such as Wavefront. However, there will always be a need for explicit detail, it’s no use knowing that there are 15 Lambda Functions using a deprecated runtime if you don’t know where those functions are nor what they are called.

DynamoDB Format

Data stored in DynamoDB has to be typed - the full list of types can be found in the AWS Documentation. We will be using only 2 of those, the String and a generic Number type.

DynamoDB stores the data in partitions and choosing a unique enough partition key is important. All data with the same partition key will be in the same partition (and probably on the same physical disk). Data rows are then differentiated by using a sort key, which is used to order the data within the partition.

For the data provided by The Lambda Hunter we’ll use the Account Number as the partition key and the Lambda Name as the sort key. Combined they should be a unique enough value.

There is one other interesting feature of DynamoDB, the data can have a lifetime, you can set an expire time, after which the data is automatically expunged from the DynamoDB table. We will make use of this to ensure that we don’t report on stale data. We’ll be running The Lambda Hunter daily, so we’ll set a lifetime (TTL) of 24 hours on the data.

Creating the DynamoDB Table

This is a Cloudformation template to create the DynamoDB table for The Lambda Hunter. We’ll have a partition key of acctnum with a sort key of functionname. The TTL specification shows which column to set for the lifetime data, in our case ttl.

Create the table either from the aws cloudformation command line application or using the AWS console.

---
AWSTemplateFormatVersion: 2010-09-09
Description: sreLambdaHunter DynamoDB Table

Resources:
  SRELambdaHunterTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: SRELambdaHunterTable
      ProvisionedThroughput:
        ReadCapacityUnits: 5
        WriteCapacityUnits: 5
      AttributeDefinitions:
        -
          AttributeName: "acctnum"
          AttributeType: "S"
        -
          AttributeName: "functionname"
          AttributeType: "S"
      KeySchema:
        -
          AttributeName: "acctnum"
          KeyType: HASH
        -
          AttributeName: "functionname"
          KeyType: RANGE
      TimeToLiveSpecification:
        AttributeName: "ttl"
        Enabled: true

Packing Data into DynamoDB Format

When transmitting data to DynamoDB all data is sent as strings, this is why it is necessary to tell DynamoDB what type the data is. The format that DynamoDB expects is

{
    "Item":
    {
        "columnname": {
            "Type": "Value"
    },
    {
    "columnname": {
        "Type": "Value"
    },
    {
        ...
    }
}

We have our data item in a python dictionary, this will need ‘packing’ into DynamoDB format. A simple python function to do that is shown below


def buildDynamoDBRow(itemdict):
    # create an output item dictionary
    item = {}

    # read each key in turn of the itemdict input
    for key in itemdict:

        # ensure that the data is in string form ready for DynamoDB
        svalue = str(itemdict[key])

        # decide whether to tell DynamoDB this is a string or a number
        columntype = "N" if key in ["ttl"] else "S"

        # create the item column in the output dict.
        item[key] = {columntype: svalue}

    # return the DynamoDB packed item
    return item

Storing Data in DynamoDB

Once we have packed our item of data we can send it to DynamoDB. DynamoDB will take care of ensuring that if the row currently exists, any changed values are overwritten. Should the row not currently exist it will be created.

import boto3

ddb = boto3.client("dynamodb")
ddb.put_item(TableName="tablename", Item=item)

Iterate over all items of data, packing them into DynamoDB format and send them off to DynamoDB.

And that is all there is to it, our data will now be available in DynamoDB whenever we choose to report on it.

Reading Data from DynamoDB

Individual rows can be retrieved from DynamoDB by using the partition and sort keys. Or you can scan the whole table and return all rows.

def ddbGetItem(table, acctnum=012345678901, funcname="wibblemasher"):
    # make a boto3 (AWS API) client
    ddb = boto3.client("dynamodb")

    # pack the keys into ddb form
    pkey = {"acctnum": {"S":str(acctnum)}, "funcname": {"S": funcname}}

    # issue the get_item request
    resp = ddb.get_item(TableName=table, Key=pkey)

    # extract the data
    data = resp["Item"]

    # return the data
    return data


def ddbScan(table):
    # make a boto3 (AWS API) client
    ddb = boto3.client("dynamodb")

    # issue the scan request
    resp = ddb.scan(TableName=table)

    # store the returned data
    data = resp["Items"]

    # check that there is more data to retrieve
    # scanning returns a sub-set of data each time
    while resp.get("LastEvaluatedKey"):

        # repeat the scan request, noting where it stopped last time
        resp = ddb.scan(TableName=table, ExclusiveStartKey=resp["LastEvaluatedKey"])

        # add the new data to the output
        data.extend(resp["Items"])

    # return the data
    return data

However we get our data each item will require to be unpacked from the DynamoDB format into a usable dictionary

def unpackItem(item):
    # create an output dictionary
    oitem = {}

    # iterate over each column in the table (key)
    for key in item:

        # read the 'letter' type
        for skey in item[key]:

            # retrieve the data stripping of any extraneous white space
            oitem[key] = item[key][skey].strip()

    # return the item data as a dictionary
    return oitem

Writing Data as CSV

Data from DynamoDB can be easily converted into a CSV format for storing in a file and importing into those overly engineered spreadsheet applications that managers love using.

The column names in DynamoDB probably would need tidying up to make clear headings for each column in the output, apart from that, the data should be a simple one to one map.

def makeCSV(data):
    # fields that we want in our output
    wanted = [
        "acctnum",
        "accountname",
        "region",
        "functionname",
        "runtime",
        "latesthms",
    ]

    # headings that line up with those fields
    headings = [
        "Account Number",
        "Account Name",
        "Region",
        "Function Name",
        "Runtime",
        "Last Run (D:H:M:S)",
    ]

    # list of lines for the csv
    lines = []

    # go through the data line by line
    for item in data:

        # make a list of the wanted data from this current line
        line = [item[field] for field in wanted]

        # append the current line to the list of lines
        lines.append(line)

    # make the intial line of the csv out of the headings list
    csv = ",".join(headings)

    # add on each line from the lines list
    # seperating each one with a new line
    for line in lines:
        csv += "\n" + ",".join(line)

    # save the csv data to a file
    with open("lambdadata.csv", "w") as ofn:
        ofn.write(csv)

    # or return the csv data for further processing (or do both)
    return csv

As shown, we’ve stored data into DynamoDB and also retrieved and converted it into a form that others can use.