Being a developer today requires a working understanding of major computer technologies, storage being one of them. Yet, storage is an overloaded term, used by different people in many different contexts, this sometimes makes it difficult for people new to storage. In this post, we'll take a look at various kinds of storage platforms and understand when and where they are generally used. I'll divide this post in three parts, first we discuss about the data protection mechanisms. Then we discuss major storage types and their use cases. Finally we discuss the APIs these systems offer. To avoid making this post too long, I'll include links for further reading and understanding wherever relevant.
Before we get into storage platforms, it is important to understand that physical storage (drives) is the common denominator across all storage platforms. The storage applications running on top of physical storage e.g. FileSystem, BlockStore, Object Storage etc that truly define the use case and behavior of the storage platform.
But there are drives already, why can't you write the data on the drives and be done with it, why do you need storage systems on top?
The average maximum capacity of a single drive today is around 14 TiB. Of course there are even higher capacity drives available, but 14 TiB seems to be the average highest number seen. More often than not, however application storage requirements are much higher.
Additionally, drives can fail anytime, so a redundancy measure is required to ensure the data is not gone if the drive(s) dies. This means, most of the low level and modern high level systems, build their own mechanism to pool multiple drives and ensure data protection in case drives die. There is so much more to a storage system than the physical drives. Let's explore further!
Protection against drive and server failures is one of the most important jobs of a storage system. There are three major data protection mechanisms in practice today, replication, RAID, and erasure code. Storage systems sometimes use a combination of these.
💾 Replication: Replication essentially means making multiple copies of the data. If a copy is lost, systems uses another copy to ensure data availability. Replication needs at least 3 copies of data for protection against single drive failure - 3 copies are needed because if you have only two copies and one copy is lost, it is not possible to determine of the remaining copy is latest or not. So, while replication is simple, it needs at least 3x of the actual usable capacity. This makes replication costly from a required hardware perspective.
📂 RAID: RAID (Redundant Array of Independent Disks) is a mechanism to combine multiple physical disks in a logical unit to ensure redundancy. RAID has multiple levels catering to different data protection levels and performance requirements. RAID has been the backbone of several SAN and NAS system for several years. However, RAID typically needs dedicated hardware e.g. RAID Controller and the rebuild times in case of drive failures is very high.
🔢 Erasure Code: Erasure code is relatively modern approach to ensure data protection across a pool of drives. Erasure code is essentially a mathematical error correction code which transforms a message of k symbols into a longer message (code word) with n symbols such that the original message can be recovered from any subset of the n symbols. This means n - k symbols (drives) can be lost and you still have the data available. Since Erasure Code is completely software driven, it is very flexible in terms of data and parity shards, it also doesn't need any special hardware and unlike RAID, the rebuild times are much much faster. All this has lead to widespread adoption of EC as the standard across modern storage systems.
Apart from pooling drives (sometimes thousands) to create larger capacity and ensuring data protection from drive failures, Storage systems also expose a set of APIs and guarantees which define the relation between application storing data and the storage system itself. These APIs decide how applications interact with the storage system and what assumptions they can make. For example, some systems are strictly consistent, while some are eventually consistent, i.e. even after a successful write, data may be available for read only after a indeterminate duration of time.
💽 Block Storage: Block storage systems take a pool of drives and expose the storage capacity as multiple individual block devices. Each block device is essentially a storage volume that can be treated as an independent disk drive and controlled by an external server operating system. This block device can be mounted by the guest operating system (over a network) as if it were a physical disk. Once mounted, each guest OS can format the block device using a file system it sees fit. The most common examples of Block Storage is SAN which itself may be based iSCSI or FibreChannel.
Underneath, block storage breaks up data into individual blocks and then stores those blocks as separate pieces, each with a unique identifier. The SAN places those blocks of data wherever it is most efficient. That means it can store those blocks across different systems and each block device can be configured to increase the size later as needed.
Traditional DataBases like PostgreSQL, MySql etc leverage block stores as the underlying storage system. This way they get expected throughput while ensuring required scale and protection against drive failures.
Large virtualization deployments using VMware, KVM, or Microsoft Hyper-V typically use block storage to attach virtual drives to each virtual machine.
SAP or other large ERP or CRM environments have also been traditionally using block storage as the local yet distributed storage.
📂 File Storage: Another approach to pooling drives together and make then one large storage system is the distributed file system. As block stores, distributed file systems allow several drives attached to computers over a network behave like a single storage system - however there is one very important difference. Block stores expose block devices that attach to guest OS and are formatted with a file system (exactly like a local drive), File Systems get attached as network drives to the guest OS, already formatted and ready to be used. Like SAN for block storage, NAS is the most common distributed file system around - both of them need dedicated hardware. There are several software defined distributed file systems and block stores too, that work with commodity hardware.
Shared File Servers for a given LAN.
Block storage use cases are also valid for File Storage systems - the choice is generally a trade off among the budget, management overhead and performance. File systems being cheaper, easier to manage yet less performant as compared to block stores.
☁️ Object Storage: Object storage systems pool the capacity across distributed drives and expose that via REST APIs e.g S3 API. Since the API is based on HTTP(s), the application that wants to store the data can be anywhere in the world. This is a big differentiator as compared to File and Block systems where the application needs to be on the same machine or LAN. Object REST API being lightweight, allows these systems to scale to much much higher capacities and truly become Internet Scale storage.
Modern, high volume, high throughput storage requirements like data analytics.
Archival of data for longer term.
File and Block systems offer POSIX style storage APIs, e.g. XFS, EXT, NFS, SMB, while the modern object storage REST APIs based on HTTP(s). POSIX and similar APIs offer granular approach of seeking, and linux style ownership per file, but this makes simple operations like read a multiple step process - making the overall process chatty and consuming system resources. Also as the scale grows to multi PBs, the POSIX semantics essentially held back the system from performing to its true capability.
These issues with POSIX lead to object APIs like S3 API that are based on HTTP(s) and geared towards higher scale. Object APIs don't have typical overhead like open(), close() and even seek() - the system doesn't remember the status of each operation. This allows Object Storage systems to grow well beyond typical file or block systems.
We discussed all the major storage systems, underlying approaches to data protection and their APIs, but you may be wondering, where does Cloud Storage fit in all this. Cloud storage is essentially like Cloud compute. I mean like in cloud compute you're running your processes on someone else's server, cloud storage allows you to use someone else's storage but all the storage basics you learnt here remain exactly same.