How to keep accurate counts in DynamoDB

Querying counts

You have a DynamoDB table containing the registered users. But how many users you have? With an SQL-based database, that would be a simple SELECT COUNT(*)
FROM users
. But DynamoDB is not that and it does have native support for efficient counting.

At first sight, it might seem like a deficiency. But the above SQL query hides iterating over all the records in a table, which is O(n) so what is one command
might not be efficient at all. This also happens with
offset pagination. Since DynamoDB
does not support anything besides the scan that may perform poorly when the number of rows in a table begins to grow, it has no count(*)
functionality.

A bad solution in DynamoDB is to use a scan. While it accurately returns the number of elements in a table (or a subset of it) without storing it as a
separate entry, it does so by reading all the items in the table. While it works for small tables, the read cost goes through the roof for any realistic
production app. So don’t be tempted to use it even if it looks good on demos.

Another suboptimal solution (though it’s mentioned in many places) is to use a DynamoDB stream to call a Lambda function that keeps a value stored somewhere in
the database up-to-date with the changes. A stream collects modified (added, deleted, and changed) elements and sends them to a target that can be a variety of
things inside AWS, such as an SNS topic, an SQS queue, or a Lambda function. A piece of code can then use these change items to increment/decrement a number.

App«DynamoDB»[]«DynamoDBItems»[]«Lambda»[]Insert userpublishcallupdate countUsing DynamoDB streams to update counts

This solution has some nice properties. It decouples the logic of keeping a count from the main application logic. The app code does not need any changes, it
writes the user records as before and it can also read the number of items. All the magic is encapsulated in the stream configuration.

But it has two significant downsides.

First, it is async, meaning that the count is updated later than the item. This can lead to inconsistencies as the latest changes might not be reflected in
the derived value.

And second, it’s not easy to make sure that the updater script runs to completion every single time. With a DDB stream + Lambda function solution you introduce
quite a bit of moving parts, each having some guarantees but errors might happen. Lambda has an at-least-once processing model, but even that can fail, sending
the call to the DLQ. And since errors accumulate, even an occasional mistake makes a permanent offset between the value and the records in the table.

Despite these downsides, it is a good solution in many cases, especially when you only need an approximation. In that case, the asynchronicity and the
occasional errors can be tolerated.

But with DynamoDB transactions, we can do better. In this article, you’ll learn how to keep accurate counts in the application logic.

Keeping counts with transactions

With transactions, you can group multiple change operations and they are guaranteed to either succeed or fail together. This allows inserting/removing an item
from the users table and simultaneously updating a count value to be an atomic operation.

Users [email protected] tabletype (PK)countusers1

There are 3 operations we need to implement to provide counting functionality: initialization, adding, and removing elements.

Initialization

Since all the other operations increase or decrease a number in the counts table, we need to make sure there is a number there. This has to be done once,
usually when the app is started.

Users tableIDemailCounts tabletype (PK)countusers0

First, we need to check whether the count is there, and add if not:

const item = await client.send(new GetItemCommand({
	TableName: COUNTS_TABLE,
	Key: marshall({type: "users"}),
}));

if (!item.Item) {
	// initialize count to 0
	await client.send(new PutItemCommand({
		TableName: COUNTS_TABLE,
		Item: marshall({type: "users", count: 0}),
		ConditionExpression: "attribute_not_exists(#pk)",
		ExpressionAttributeNames: {"#pk": "type"},
	}));
}

The ConditionExpression: "attribute_not_exists(#pk)" makes sure that the database is in the state we expect it to be. As there are no transactions between
the commands (the GetItemCommand and the PutItemCommand), another thread or app might have inserted a value. This check makes sure that it’s not
overwritten.

Adding an item

To add an item that requires changing the count value, we need a transaction to modify both at the same time.

Users [email protected]@example.comCounts tabletype (PK)countusers12

The ConditionExpression: attribute_not_exists(#pk) is also important here as we need to make sure that the user item is indeed inserted into the table.
Without that, it might overwrite an existing record and still increase the count.

To modify a number by a given value, DynamoDB supports an UpdateExpression. This has the advantage of multiple transactions can write it simultaneously
without the need for ConditionExpressions.

// the user object
const item = ...;

// insert new user
await client.send(new TransactWriteItemsCommand({
	TransactItems: [
		{
			Put: {
				TableName: USERS_TABLE,
				ConditionExpression: "attribute_not_exists(#pk)",
				ExpressionAttributeNames: {"#pk": "ID"},
				Item: marshall(item),
			}
		},
		{
			Update: {
				TableName: COUNTS_TABLE,
				UpdateExpression: "ADD #count :count",
				ExpressionAttributeNames: {"#count": "count"},
				ExpressionAttributeValues: marshall({":count": 1}),
				Key: marshall({type: "users"}),
			}
		}
	]
}));

The above code can only insert a new user item but it can not update one. This is because of the ConditionExpression: "attribute_not_exists(#pk)". To
implement modifications, use a PutItemCommand with a condition to make sure the item exists:

// the user object
const item = ...;

// update existing user
await client.send(new PutItemCommand({
	TableName: USERS_TABLE,
	ConditionExpression: "attribute_exists(#pk)",
	ExpressionAttributeNames: {"#pk": "ID"},
	Item: marshall(item),
}));

Deleting an item

Deletion works similar to addition, with the exception that it needs a Delete operation.

Users [email protected]@example.comCounts tabletype (PK)countusers21

It needs to do 3 things:

  • check that the item exists (ConditionExpression: "attribute_exists(#pk)")
  • decrement the count
  • delete the item
await client.send(new TransactWriteItemsCommand({
	TransactItems: [
		{
			Delete: {
				TableName: USERS_TABLE,
				ConditionExpression: "attribute_exists(#pk)",
				ExpressionAttributeNames: {"#pk": "ID"},
				Key: marshall({ID}),
			}
		},
		{
			Update: {
				TableName: COUNTS_TABLE,
				UpdateExpression: "ADD #count :count",
				ExpressionAttributeNames: {"#count": "count"},
				ExpressionAttributeValues: marshall({":count": -1}),
				Key: marshall({type: "users"}),
			}
		}
	]
}));

Conclusion

Keeping accurate counts is not easy in DynamoDB but it can be done using transactions. Make sure to use the correct ConditionExpressions in every
operation to avoid accumulating errors when multiple processes are changing the same data.

Source: Advanced Web Machinery

Leave a Reply

Your email address will not be published.


*