Add post DynamoDB making the slow explicit

2023-02-26 16:58:48 -05:00 · 2023-02-26 16:58:48 -05:00 · acf5d30862
parent d0a964fc60
commit acf5d30862
1 changed files with 90 additions and 0 deletions
--- a/content/posts/2023.02.26.making-the-slow-explicit-dynamodb-sql.md
+++ b/content/posts/2023.02.26.making-the-slow-explicit-dynamodb-sql.md
@ -0,0 +1,90 @@
+---
+title: "Making the Slow Explicit: Dynamodb vs SQL"
+date: 2023-02-26T15:51:19-05:00
+toc: false
+images:
+tags:
+  - dev
+  - web
+---
+
+SQL databases like MySQL, MariaDB, and PostgreSQL are highly performant and can
+scale well. However in practice it's not rare that people run into performance
+issues with these databases, and run to NoSQL solutions like DynamoDB.
+
+Proponents of DynamoDB like Alex DeBrie, the author of ["The DynamoDB Book"](https://www.dynamodbbook.com/)
+point to a few things for this difference: HTTP-based APIs of NoSQL databases are more efficient than TCP connections used by SQL databases,
+table joins are slow, SQL databases are designed to save disk space while NoSQL databases take advantage of large modern disks.[^1]
+
+[^1]: I don't have my copy of the book handy, so I wrote these arguments from
+    memory. I'm confident that I remember them correctly, but apologies if I
+    misremembered some details.
+
+These claims don't make a lot of sense to me though. HTTP runs over TCP, it's
+not going to be magically faster. Table joins do make queries complex, but they
+are a common feature that SQL engines are designed to optimize. And I don't
+understand the point about SQL databases being designed to save space. While
+disk capacities have skyrocketed, even the fastest disks are extremely slow
+compared to how fast CPUs can crunch numbers. A single cache miss can stall a
+CPU core for millions of cycles, so it's critical to fit data in cache. That
+means making your data take up as little space as possible. Perhaps Alex is
+talking about data normalization which is a property of database schemas and not
+the database itself, but normalization is not about saving space either, it's
+about keeping a single source of truth for everything. I feel like at the end of
+the day, these arguments just boil down to "SQL is old and ugly, NoSQL is new
+and fresh".
+
+That being said, I think there is still the undeniable truth that people in
+practice do hit performance issues with SQL databases far more often than they
+hit performance issues with NoSQL databases like DynamoDB. And I think I know
+why: it's because DynamoDB makes what is slow explicit.
+
+Look at these 2 SQL queries, can you spot the performance difference between
+them?
+
+```SQL
+SELECT * FROM users WHERE user_id = ?;
+SELECT * FROM users WHERE group_id = ?;
+```
+
+It's a trick question, of course you can't! Not without looking at the table
+schema to check if there are indexes on `user_id` or `group_id`. And you'd
+likely have to run `DESCRIBE ...` if the query was more complex to make sure the
+database will actually execute it the way you think it will.
+
+I think this makes it easy to write bad queries. Look at [Jesse Skinner's article](https://www.codingwithjesse.com/blog/debugging-a-slow-web-app/)
+about the time where he found a web app where all the `SELECT` queries were using `LIKE` instead of `=`
+which meant that the queries were not using indexes at all! While it's easy to
+think that the developer who made the mistake of using `LIKE` everywhere was just
+a bad developer, I think the realization we need to come to is that it is too easy to make these mistakes.
+The same `SELECT` query could be looking up a single item by its primary key,
+or it could be doing a slow table scan. The same syntax could return you a single result, or it could return you a million results.
+If you make a mistake, there is no indication that you made a mistake until
+your application has been live for months or even years and your database has grown to a size
+where these queries are now choking.
+
+On one hand I think this speaks to how high performance SQL databases are. You
+can write garbage queries and still get decent performance until your tables
+grow to hundreds of thousands of rows! But at the same time I think this is
+exactly why DynamoDB ends up being more scalable in production: because bad
+queries are explicit.
+
+With DynamoDB, if you want to get just one item by its unique key, then you use
+a `Get` operation that makes this explicit. If you make a query that selects
+items based on a key condition, that's an explicit `Query` operation. And your
+query will return you only a small number of results and require you to paginate
+with a cursor. Again making it explicit that you could be querying for many
+items! And a query never falls back to scanning an entire table, you do a `Scan`
+operation for that which makes it explicit that you are doing something wrong.
+
+Rather than any magic about table joins or differences in connection types, I
+think this is really the biggest difference in what makes DynamoDB more
+scalable. It's not because DynamoDB is magic, it's because it makes bad patterns
+more visible. I think it's critical that we make our tools be explicit and even
+painful when using them in bad patterns, because we will accidentally follow bad
+patterns if it's easy to do so.
+
+I want to add though, DynamoDB is not perfect in this regard either. I
+particularly see this with filters. It's easy to see why Amazon added filters,
+but it's not rare that people use filters without understanding how they work
+and end up making mistakes (for example, [here](https://stackoverflow.com/questions/64814040/dynamodb-scan-filter-not-returning-results-for-some-requests)).