4 changed files with 0 additions and 90 deletions
--- a/content/posts/2022.01.28.typescript-get-inferred-type.md
+++ b/content/posts/2022.01.28.typescript-get-inferred-type.md
--- a/content/posts/2022.02.10.why-use-devcontainer.md
+++ b/content/posts/2022.02.10.why-use-devcontainer.md
--- a/content/posts/2022.02.18.github-actions-do-not-merge-label.md
+++ b/content/posts/2022.02.18.github-actions-do-not-merge-label.md
--- a/content/posts/2023.02.26.making-the-slow-explicit-dynamodb-sql.md
+++ b/content/posts/2023.02.26.making-the-slow-explicit-dynamodb-sql.md
@ -1,90 +0,0 @@
 ---
 title: "Making the Slow Explicit: Dynamodb vs SQL"
 date: 2023-02-26T15:51:19-05:00
 toc: false
 images:
 tags:
  - dev
  - web
 ---
 SQL databases like MySQL, MariaDB, and PostgreSQL are highly performant and can
 scale well. However in practice it's not rare that people run into performance
 issues with these databases, and run to NoSQL solutions like DynamoDB.
 Proponents of DynamoDB like Alex DeBrie, the author of ["The DynamoDB Book"](https://www.dynamodbbook.com/)
 point to a few things for this difference: HTTP-based APIs of NoSQL databases are more efficient than TCP connections used by SQL databases,
 table joins are slow, SQL databases are designed to save disk space while NoSQL databases take advantage of large modern disks.[^1]
 [^1]: I don't have my copy of the book handy, so I wrote these arguments from
    memory. I'm confident that I remember them correctly, but apologies if I
    misremembered some details.
 These claims don't make a lot of sense to me though. HTTP runs over TCP, it's
 not going to be magically faster. Table joins do make queries complex, but they
 are a common feature that SQL engines are designed to optimize. And I don't
 understand the point about SQL databases being designed to save space. While
 disk capacities have skyrocketed, even the fastest disks are extremely slow
 compared to how fast CPUs can crunch numbers. A single cache miss can stall a
 CPU core for millions of cycles, so it's critical to fit data in cache. That
 means making your data take up as little space as possible. Perhaps Alex is
 talking about data normalization which is a property of database schemas and not
 the database itself, but normalization is not about saving space either, it's
 about keeping a single source of truth for everything. I feel like at the end of
 the day, these arguments just boil down to "SQL is old and ugly, NoSQL is new
 and fresh".
 That being said, I think there is still the undeniable truth that people in
 practice do hit performance issues with SQL databases far more often than they
 hit performance issues with NoSQL databases like DynamoDB. And I think I know
 why: it's because DynamoDB makes what is slow explicit.
 Look at these 2 SQL queries, can you spot the performance difference between
 them?
 ```SQL
 SELECT * FROM users WHERE user_id = ?;
 SELECT * FROM users WHERE group_id = ?;
 ```
 It's a trick question, of course you can't! Not without looking at the table
 schema to check if there are indexes on `user_id` or `group_id`. And you'd
 likely have to run `DESCRIBE ...` if the query was more complex to make sure the
 database will actually execute it the way you think it will.
 I think this makes it easy to write bad queries. Look at [Jesse Skinner's article](https://www.codingwithjesse.com/blog/debugging-a-slow-web-app/)
 about the time where he found a web app where all the `SELECT` queries were using `LIKE` instead of `=`
 which meant that the queries were not using indexes at all! While it's easy to
 think that the developer who made the mistake of using `LIKE` everywhere was just
 a bad developer, I think the realization we need to come to is that it is too easy to make these mistakes.
 The same `SELECT` query could be looking up a single item by its primary key,
 or it could be doing a slow table scan. The same syntax could return you a single result, or it could return you a million results.
 If you make a mistake, there is no indication that you made a mistake until
 your application has been live for months or even years and your database has grown to a size
 where these queries are now choking.
 On one hand I think this speaks to how high performance SQL databases are. You
 can write garbage queries and still get decent performance until your tables
 grow to hundreds of thousands of rows! But at the same time I think this is
 exactly why DynamoDB ends up being more scalable in production: because bad
 queries are explicit.
 With DynamoDB, if you want to get just one item by its unique key, then you use
 a `Get` operation that makes this explicit. If you make a query that selects
 items based on a key condition, that's an explicit `Query` operation. And your
 query will return you only a small number of results and require you to paginate
 with a cursor. Again making it explicit that you could be querying for many
 items! And a query never falls back to scanning an entire table, you do a `Scan`
 operation for that which makes it explicit that you are doing something wrong.
 Rather than any magic about table joins or differences in connection types, I
 think this is really the biggest difference in what makes DynamoDB more
 scalable. It's not because DynamoDB is magic, it's because it makes bad patterns
 more visible. I think it's critical that we make our tools be explicit and even
 painful when using them in bad patterns, because we will accidentally follow bad
 patterns if it's easy to do so.
 I want to add though, DynamoDB is not perfect in this regard either. I
 particularly see this with filters. It's easy to see why Amazon added filters,
 but it's not rare that people use filters without understanding how they work
 and end up making mistakes (for example, [here](https://stackoverflow.com/questions/64814040/dynamodb-scan-filter-not-returning-results-for-some-requests)).