[A Guide to] Object Storage - Cut Storage Costs and Improve Availability
If you don't know what object storage is you've probably heard of AWS S3, or at least AWS itself. In short, object storage is an alternative to the traditional file system that allows for fast redundant storage at scale. It fits nicely into modern serverless architectures and is even much cheaper than traditional storage solutions while having a lot of benefits. I don't want to make this too long but this article will talk a bit about how it works and how you can use object storage to save money and make use of some of the advantages.
Traditional file storage
So before we get to the details about object storage, it's important to first understand the more traditional storage solutions so here's a brief overview. Most people are familiar with the idea of how file storage works, data is stored as files that can have certain metadata which are then organized into folders.
Imagine a sort of warehouse where you place items in boxes which then, in turn, can be organized into bigger boxes. That's file storage in a nutshell, and to retrieve an item you have to know what boxes it's in and which item you're looking for.
Some problems arise when we need to prevent loss of data, backing everything up nightly isn't practical especially once you have more than a few terabytes of data. One popular solution is to use a form of RAID array, which introduces some redundancy across multiple hard drives. The simplest type of RAID which adds redundancy is RAID 1, which involves mirroring all of the data across two or multiple drives. In the event of a drive failure, there wouldn't be any data loss because you have another exact copy. The issue is that now you now need to pay for twice as much storage, plus it's not easily scalable as you're dealing in drives with fixed amounts of storage space and the different file storage systems actually have performance bottlenecks past certain points.
Now this time imagine valet parking, where you leave your car and you get a ticket in return. When you go to retrieve it you don't need to know where your car is parked, you simply present your ticket and it's retrieved for you. Oh, and when your data is stored (car is parked), it's actually split up into a bunch of pieces that are somewhat duplicated and spread out across multiple hard drives or even regions. So instead of storing data as files, it's stored as "objects". And retrieving your object is as simple as a car with valet parking, you query the REST API provided by the storage system and it hands you back your file.
This actually solves our problem of protecting against data loss with backups, the redundancy is built in! Your object might be split up into 20 parts, and you only need let's say 16 of those to be able to reassemble the file. That means you could lose 4 parts without suffering any data loss. You also don't have to worry about scalability, now you can use an object storage provider who simply allocates a certain amount of space to you.
Now, this is all great but how can you actually make use of object storage? Keep reading
Some of you may be wondering how you can organize your objects if there aren't any folders/directories. The answer is you can simulate a hierarchy by using prefixes in object names while still retaining the benefits of built-in redundancy.
For example, instead of cat.png, you could use the name /animals/cat.png which looks similar to as if it were in a folder.
One of the great things about paying for object storage is that you only pay for what you actually use. The most well-known provider for object storage is Amazon with their S3 storage. However well-known ≠ best or cheapest.
My personal favourite is Backblaze B2, it has both a native and S3 compatible API along with zero fees to upload data. It's also less than a quarter of the price of S3 for storing data, and you pay zero download/egress fees when used in conjunction with Cloudflare or any other "Bandwidth Alliance" members.
Another great option is Wasabi, which although slightly more expensive than Backblaze for storing data has zero fees for download/egress or API requests.
Setting it up
I'll use Backblaze as an example but they're all fairly similar. You start by creating an account, and then you can upload/download data as objects through the dashboard same as any cloud storage.
To use it with your application, there are a set of API endpoints that allow you to interact with the storage and handle authentication. For example Backblaze's B2 storage has the
b2_download_file_by_id endpoint to retrieve data and you can find the rest of the endpoints listed here.
Thanks for sticking around, there's a lot I left out of this article so if you have any questions comment below and I'll get back to you!
If you enjoyed this article feel free to subscribe to my bog's newsletter. You can also learn more about me or get in touch here.