5333 private links
The datastore is often the most important part of an application. Code can be changed easily, and new code can be deployed without much fuss if you discover that some of your original choices were wrong, but the data model and the way it is handled is much harder to change. This means that you need to give the data model as much thought as you can when starting out, and the choice of datastore greatly influences that decision.
This post is meant to guide you through some common pitfalls, and hopefully explain why a relational database is a much saner default than the schemaless databases I see most people instinctively reach for nowadays. //
Schemas are awesome. More importantly, they’re inevitable. There’s no application that doesn’t use a schema, as there’s no application where you only write data without the reader needing to know what kind of data it’s going to read. The schema is just implicit, in the application code, instead of explicit, in the datastore, where it should be. This means that schemaless databases aren’t really schemaless, they just kick the can down the road and let you get away with not defining a schema early on, which invariably comes back to bite you in the ass later.
When you use a schemaless database, you’re essentially saying “I don’t want to deal with the schema now, just let me write the data, my future self can deal with it when reading”. However, when would you rather get an error? When you write the data in the database and can retry, or a year later, when you read it and it turns out that that one entry had escaped notice and is still using the old format?
In almost all cases, the choice is clear, you want your application to produce an error while writing, when you can still do something about it, and ensure that your datastore will always contain “clean” data. That’s impossible to do without a datastore that enforces a schema.