Compare commits
	
		
			No commits in common. "acf5d30862a076339a8b70000f7aed0ec8490bf5" and "0002a93e081da1380fc6a11c75bb2c02b65de40a" have entirely different histories.
		
	
	
		
			acf5d30862
			...
			0002a93e08
		
	
		
|  | @ -1,90 +0,0 @@ | ||||||
| --- |  | ||||||
| title: "Making the Slow Explicit: Dynamodb vs SQL" |  | ||||||
| date: 2023-02-26T15:51:19-05:00 |  | ||||||
| toc: false |  | ||||||
| images: |  | ||||||
| tags: |  | ||||||
|   - dev |  | ||||||
|   - web |  | ||||||
| --- |  | ||||||
| 
 |  | ||||||
| SQL databases like MySQL, MariaDB, and PostgreSQL are highly performant and can |  | ||||||
| scale well. However in practice it's not rare that people run into performance |  | ||||||
| issues with these databases, and run to NoSQL solutions like DynamoDB. |  | ||||||
| 
 |  | ||||||
| Proponents of DynamoDB like Alex DeBrie, the author of ["The DynamoDB Book"](https://www.dynamodbbook.com/) |  | ||||||
| point to a few things for this difference: HTTP-based APIs of NoSQL databases are more efficient than TCP connections used by SQL databases, |  | ||||||
| table joins are slow, SQL databases are designed to save disk space while NoSQL databases take advantage of large modern disks.[^1] |  | ||||||
| 
 |  | ||||||
| [^1]: I don't have my copy of the book handy, so I wrote these arguments from |  | ||||||
|     memory. I'm confident that I remember them correctly, but apologies if I |  | ||||||
|     misremembered some details. |  | ||||||
| 
 |  | ||||||
| These claims don't make a lot of sense to me though. HTTP runs over TCP, it's |  | ||||||
| not going to be magically faster. Table joins do make queries complex, but they |  | ||||||
| are a common feature that SQL engines are designed to optimize. And I don't |  | ||||||
| understand the point about SQL databases being designed to save space. While |  | ||||||
| disk capacities have skyrocketed, even the fastest disks are extremely slow |  | ||||||
| compared to how fast CPUs can crunch numbers. A single cache miss can stall a |  | ||||||
| CPU core for millions of cycles, so it's critical to fit data in cache. That |  | ||||||
| means making your data take up as little space as possible. Perhaps Alex is |  | ||||||
| talking about data normalization which is a property of database schemas and not |  | ||||||
| the database itself, but normalization is not about saving space either, it's |  | ||||||
| about keeping a single source of truth for everything. I feel like at the end of |  | ||||||
| the day, these arguments just boil down to "SQL is old and ugly, NoSQL is new |  | ||||||
| and fresh". |  | ||||||
| 
 |  | ||||||
| That being said, I think there is still the undeniable truth that people in |  | ||||||
| practice do hit performance issues with SQL databases far more often than they |  | ||||||
| hit performance issues with NoSQL databases like DynamoDB. And I think I know |  | ||||||
| why: it's because DynamoDB makes what is slow explicit. |  | ||||||
| 
 |  | ||||||
| Look at these 2 SQL queries, can you spot the performance difference between |  | ||||||
| them? |  | ||||||
| 
 |  | ||||||
| ```SQL |  | ||||||
| SELECT * FROM users WHERE user_id = ?; |  | ||||||
| SELECT * FROM users WHERE group_id = ?; |  | ||||||
| ``` |  | ||||||
| 
 |  | ||||||
| It's a trick question, of course you can't! Not without looking at the table |  | ||||||
| schema to check if there are indexes on `user_id` or `group_id`. And you'd |  | ||||||
| likely have to run `DESCRIBE ...` if the query was more complex to make sure the |  | ||||||
| database will actually execute it the way you think it will. |  | ||||||
| 
 |  | ||||||
| I think this makes it easy to write bad queries. Look at [Jesse Skinner's article](https://www.codingwithjesse.com/blog/debugging-a-slow-web-app/) |  | ||||||
| about the time where he found a web app where all the `SELECT` queries were using `LIKE` instead of `=` |  | ||||||
| which meant that the queries were not using indexes at all! While it's easy to |  | ||||||
| think that the developer who made the mistake of using `LIKE` everywhere was just |  | ||||||
| a bad developer, I think the realization we need to come to is that it is too easy to make these mistakes. |  | ||||||
| The same `SELECT` query could be looking up a single item by its primary key, |  | ||||||
| or it could be doing a slow table scan. The same syntax could return you a single result, or it could return you a million results. |  | ||||||
| If you make a mistake, there is no indication that you made a mistake until |  | ||||||
| your application has been live for months or even years and your database has grown to a size |  | ||||||
| where these queries are now choking. |  | ||||||
| 
 |  | ||||||
| On one hand I think this speaks to how high performance SQL databases are. You |  | ||||||
| can write garbage queries and still get decent performance until your tables |  | ||||||
| grow to hundreds of thousands of rows! But at the same time I think this is |  | ||||||
| exactly why DynamoDB ends up being more scalable in production: because bad |  | ||||||
| queries are explicit. |  | ||||||
| 
 |  | ||||||
| With DynamoDB, if you want to get just one item by its unique key, then you use |  | ||||||
| a `Get` operation that makes this explicit. If you make a query that selects |  | ||||||
| items based on a key condition, that's an explicit `Query` operation. And your |  | ||||||
| query will return you only a small number of results and require you to paginate |  | ||||||
| with a cursor. Again making it explicit that you could be querying for many |  | ||||||
| items! And a query never falls back to scanning an entire table, you do a `Scan` |  | ||||||
| operation for that which makes it explicit that you are doing something wrong. |  | ||||||
| 
 |  | ||||||
| Rather than any magic about table joins or differences in connection types, I |  | ||||||
| think this is really the biggest difference in what makes DynamoDB more |  | ||||||
| scalable. It's not because DynamoDB is magic, it's because it makes bad patterns |  | ||||||
| more visible. I think it's critical that we make our tools be explicit and even |  | ||||||
| painful when using them in bad patterns, because we will accidentally follow bad |  | ||||||
| patterns if it's easy to do so. |  | ||||||
| 
 |  | ||||||
| I want to add though, DynamoDB is not perfect in this regard either. I |  | ||||||
| particularly see this with filters. It's easy to see why Amazon added filters, |  | ||||||
| but it's not rare that people use filters without understanding how they work |  | ||||||
| and end up making mistakes (for example, [here](https://stackoverflow.com/questions/64814040/dynamodb-scan-filter-not-returning-results-for-some-requests)). |  | ||||||
		Loading…
	
		Reference in a new issue