AWS DynamoDB Introduction

Introduction

Goals:

  • Define the differences between DynamoDB and other databases
  • Understand when to use DynamoDB, or when not to use it
  • Create DynamoDB tables, understanding and using best practices for table design
  • Read and write data in DynamoDB tables

We’ll go through examples using the AWS console and the DynamoDB API, and build queries to retrieve data from DynamoDB.

What is DynamoDB?

Amazon DynamoDB is a fully managed NoSQL database service. By “fully managed,” we mean that the DynamoDB service is run entirely by the team at Amazon Web Services. There’s no database administration required on your end, no servers to manage, no levers to tune, and nothing to back up. All of this is handled for you by AWS. All you have to do is set up your tables and configure the level of provisioned throughput that each table should have. Provisioned throughput refers to the level of read and write capacity that you want AWS to reserve for your table. You are charged for the total amount of throughput that you configure for your tables, plus the total amount of storage space used by your data.

DynamoDB is a NoSQL database, which means that it doesn’t use the common structured query language, or SQL. It’s not a relational database. Instead, it falls into a category of databases known as key value stores. A key value store is a collection of items or records. You can look up data by the primary key for each item, or through the use of indexes.

DynamoDB tables are considered schemaless, because there’s no strict design schema that every record must conform to. As long as each item has an appropriate primary key, the item can contain varying sets of attributes. The records in a table do not need to have the same attributes or even the same number of attributes. This can be very convenient for rapid application development. If you want to add a new column to your table, you don’t need to alter the table. Just start including the new field as an attribute when you insert new records. Likewise, you never need to adjust the data type for a column. DynamoDB generally doesn’t care about data types for individual attributes.

DynamoDB is offered as a service, available from inside the AWS network or over the internet. DynamoDB uses Amazon Web Services’ standard features for identity and access management. You can interact with DynamoDB using the AWS web console, but more often you’ll write application code that connects to DynamoDB through its application programming interface, or API.

Advantages

Some of the advantages of DynamoDB are that it’s fully managed by Amazon Web Services. You don’t have to worry about backups or redundancy, although you’re welcome to set up these kind of safeguards using some more advanced DynamoDB features. As just described, DynamoDB tables are schemaless, so you don’t have to define the exact data model in advance. The data model can change automatically to fit your application’s needs.

DynamoDB is designed to be highly available. Your data is automatically replicated across three different availability zones within a geographic region. In the case of an outage or an incident affecting an entire hosting facility, DynamoDB transparently routes around the affected availability zone.

DynamoDB is also designed to be fast. Read and writes take just a few milliseconds to complete, and DynamoDB will be fast no matter how large your tables grow. Unlike a relational database, which can slow down as the table gets large, DynamoDB performance is constant and stays consistent even with tables that are many terabytes large. You don’t have to do anything to handle this, except adjusting the provisioned throughput levels to make sure you’ve reserved enough read and write capacity for your transaction volume.

Disadvantages

But there are also some downsides to using DynamoDB too. As I just mentioned, your data is automatically replicated. Three copies are stored in three different availability zones. That replication usually happens quickly, in milliseconds, but sometimes it can take longer. This is known as eventual consistency. This happens transparently and many operations will make sure that they’re always working on the latest copy of your data. But there are certain kinds of queries and table scans that may return older versions of data before the most recent copy. You need to be aware of how this works, and you may need to adjust certain queries to require strong consistency.

DynamoDB’s queries aren’t as flexible as what you can do with SQL. If you’re used to writing advanced queries with joins and groupings, and summaries, you won’t be able to do that with DynamoDB. You’ll have to do more of the computation in your application code.This is done for performance reasons, to ensure that every query finishes quickly and that complicated queries can’t hog the resources on a database server.

DynamoDB doesn’t offer the wide range of data types that many relational databases do. DynamoDB only has a few native data types, strings or text, numbers, Boolean values “True” and “False”, and binary data. If you work with other data types like dates, you’ll need to represent those as strings or numbers in order to store them in DynamoDB.

DynamoDB also has some strict limitations in the way you’re allowed to work with it. Two important limitations are the maximum record size of 400 kilobytes and the limit of 10 indexes per table. There are other limitations that can be adjusted by contacting AWS Customer Support, like the maximum number of tables in an AWS account.

Finally, although DynamoDB performance can scale up as your needs grow, your performance is limited to the amount of read and write throughput that you’ve provisioned for each table. If you expect a spike in database use, you will need to provision more throughput in advance or database requests will fail with a ProvisionedThroughputExceededException. Fortunately, you can adjust throughput at any time, and it only takes a couple of minutes to adjust. Still this means that you’ll need to monitor the throughput being used on each table, or you’ll risk running out of throughput if your usage grows.

Comparing DynamoDB to Other Databases

Compared to other SQL databases

Let’s take a look at how DynamoDB compares to other database technology that you might already be familiar with. You’ve probably already worked with a relational database like MySQL, Oracle, or Microsoft SQL Server. DynamoDB is a NoSQL database which means that you won’t be writing SQL queries to read and write data. Instead, you’ll need to make calls to specific DynamoDB APIs with names like PutItem, GetItem or query.

DynamoDB is also schemaless so you don’t have to define a fixed table structure in advance. When you insert a DynamoDB record, it can have any attributes that it needs. So you can adjust columns and data types on the fly.

DynamoDB was designed from the beginning to be extremely scalable. Your data can grow to many terabytes, with hundreds of thousands of reads and writes per second. You don’t have to manage the servers or the partitions, this is all handled for you by AWS. It’s much more difficult to do this in a relational database because of the way data structures like indexes work in a relational system. But this flexibility means that DynamoDB does not follow the typical ACID properties of atomicity, consistency, isolation, and durability.

Data in DynamoDB is quite durable, but because of the way data is replicated across the AWS infrastructure, it’s only eventually consistent. If you modify a record and then read it back, you may sometimes receive the older version of the data until the data is fully consistent across all three replicates.

Finally, be aware that using DynamoDB will tie you to Amazon Web Services. There aren’t really other options for hosting DynamoDB databases except using the AWS public cloud. If you need to keep all of your data on premises, or if you want to be able to easily switch cloud hosting vendors, then DynamoDB might not be right for you.

Compared to other NoSQL databases

If you’ve used NoSQL databases like MongoDB or Cassandra, Then DynamoDB might feel more at home. DynamoDB is considered a document-oriented database much like MongoDB. The database can understand a little bit about the structure of each record or document but each item can have it’s own unique structure.

Other NoSQL databases are in different categories. For example, HBase is a column storer. DynamoDB only supports an eventual consistency model. Some other NoSQL databases can be used on a single service where there’s no need to worry about consistency across a cluster. Or they can be used in a mode that enforces strict consistency across the cluster.

DynamoDB scales transparently to the users. It doesn’t matter whether your table has 10 rows or 10 billion rows. You don’t have to do anything differently as your tables grow. Many other NoSQL databases do allow you to scale up as your data grows. But you’ll have to manage individual servers or add servers to the cluster as you need them.

Compared to other hosted databases

Finally, DynamoDB is always managed by Amazon Web Services. This is great because it reduces the operation burden for your team, but again it means that you are tied to the AWS public cloud. If you’re trying to decide between different ways of running a database on AWS, use Amazon DynamoDB if you want a fully managed database that can scale easily and you don’t mind learning new concepts. Use Amazon’s Relational Database Service if you want a managed database instance using a standard relational database like MySQL, Postgres, Oracle or Microsoft SQL Server. And if you’re concerned about giving control of your data to a hosting provider, or if you need the data to live on premises, then you should probably host your own database either in the cloud or in your own data center.

Using DynamoDB in your application

I’ve mentioned the DynamoDB API a few times in this post. Now we will go into more detail on what that actually means, and how you can integrate DynamoDB into your application code.

There are two ways you can interact with DynamoDB. You can use the Amazon Web Services web console, which provides a graphical interface for administering DynamoDB tables, looking at the data in your tables, and adding and modifying your data. You can also write code which interacts with DynamoDB programmatically. We will take a look at both ways of working with DynamoDB.

Your software interacts with DynamoDB using its application programming interface, or API. The API is organized as a set of remote procedure calls or different operations that you can execute on your DynamoDB data. Each operation has a name like CreateTable, a set of parameters that are required like the name of the table to be created, and a set of outputs that are sent back in the response like whether this operation was successful. Amazon provides comprehensive documentation about all of the API methods for DynamoDB on the web, and you can also download them as a PDF.

There are 13 different methods for the DynamoDB API, which can be broken down into 3 categories:

  • The first methods are for managing the tables in your DynamoDB account. There are methods for understanding what tables you have in your account like ListTables and DescribeTable, and methods for making changes to your tables like CreateTable, UpdateTable, and DeleteTable.
  • There are four methods for reading data, GetItem and BatchGetItem, for reading single records by their ID or primary key, and Query and Scan for reading many records using an index.
  • There are four methods for modifying data, PutItem to store a single new record, UpdateItem and DeleteItem to modify or delete a single record, and BatchWriteItem to make many changes at once.

DynamoDB API is a set of HTTP endpoints

The DynamoDB API is actually a set of HTTP endpoints. This is why it’s referred to as a web service. When you make an API call, you are actually making an HTTP request with very specific parameters in a very specific structure. The request has a signature, which is how Amazon authenticates that the call was made by a user with the appropriate permissions. If you call the DescribeTable API method, this is what the HTTP request actually looks like as it goes over the network,

and this is what the response looks like.

Fortunately, you probably won’t have to work with these HTTP requests and responses directly. There are libraries for most popular programming languages, which make it easy to make calls to the various Amazon Web Services APIs. Amazon provides libraries and software development kits for many common platforms, including Java, .NET, PHP, Python, Ruby, Go, C++, and for mobile development using Android or iOS.

For python typically we would use boto3. Boto3 code looks very similar to the raw API calls.

More DynamoDB topics not covered here

  • Create your first table (this is fairly simple from console and also from API)
  • Create a table with Composite Primary Key (simple again)
  • Understand Provisioned Throughput option while creating your tables in Dynamodb.
  • How to Read and Write data to a DynamoDB table?
  • Difference between Queries and Scans.
  • Create Secondary Indexes and Query Secondary Indexes
  • Working with large tables using Partitioning