Skip to content

Latest commit

 

History

History
49 lines (25 loc) · 11.9 KB

TieringofStorageBenefits.mdown

File metadata and controls

49 lines (25 loc) · 11.9 KB

Tiering of Storage, Why Do it and How to Realize Greatest Benefits

Introduction

In a typical modern datacenter infrastructure storage centralization is a common sight. In fact, more and more organizations are pushing storage out from the individual machines and into a centralized segment, often with dedicated groups administering this segment, solely focused on reliably, efficiently and rapidly delivering storage on demand to the rest of the organization. Typical centralization efforts are not granular enough, at least not during the initial phases of efforts to consolidate and centralize to really dig into which parts of the environment are most critical, which are least so, and just what these different parts of the greater machine require from storage. Most consolidation efforts start out looking at the overall capacity requirements, often forgetting that capacity is only one check-box on the lengthy checklist of requirements. Because most environments have varying levels of system criticality, availability and performance, the same notions should be translated to centralized storage. Far too often we overlook these critical elements. This document will attempt to address these elements in some detail and hopefully provide some food for thought about whether or not tiering is something that customer reading this document will benefit from in their environment.

Typical Model of Today's Complex Systems

Few if any organizations today have a single environment that they consider critical from first server to last, and nothing else. It is far more common to observe multiple environments, several different applications with varying service levels and performance requirements. It is difficult to imagine any environment that does not have at a Production, a Staging and some development areas, as well as typical reporting, analytics and other non-real-time systems. These same environments will also contain absolutely business critical systems, those comprising the core of the business and those with far shorter SLAs and greater requirements for availability. For example, an outage in the development environment may prevent development and perhaps debugging activities from being conducted by staff, but it is unlikely to lead to severe hardship for an organization whose primary business may be running a SaaS platform where this software is being hosted for the numerous high-profile clients. On the other hand an outage of components belonging to the production operations and affecting availability of said SaaS platform will be critical and depending on the outage could constitute an emergency leading to an-all-hands-on-deck situation at night time of day or night.

Understanding differences in availability requirements is key, and most organizations do this very well, however they do not always architect their systems with this tiered perspective. Instead, architectures are often monolithic, commonly resulting in designs that are less scalable, more costly than necessary and more complex, leading into a spiral where complexity breeds more complexity and fixing these complexities only makes things more difficult and perhaps more complex.

De-centralized Storage is Naturally Tiered

When systems architects and sysads work together to build systems for a particular application they focus on the various aspects of that particular application, such as its memory requirements, CPU requirements, capacity, IOPs, etc. Most environments have some subset of applications each of which has a system that was designed for it, be it a single machine or a datacenter packed with machines. When we build a system for a specific application we look at storage specifically from the standpoint of this application, and as we continue to build out this application, as long as we follow a reference design our local storage will be specifically designed for this app and will behave very predictably.

It is a natural and organic process by which local storage that is tied to specific applications ends-up being tiered, without us realizing that it is in fact being tiered, because we are never considering this storage as being used by anything other than the app for which we are designing it. If we had say three environments with the same app running in all of them, but one was a critical Production while the others were still critical, but less so, such as Development and Staging chances are, we would spec internal storage differently between the critical Production environment and the other two. This is an organic tiering method where Production is seen as more critical and will likely have storage that is more robust and more highly-available than storage used for the two non-Production environments.

Key point that we make with this example is that we already naturally tier non-shared storage on isolated systems with similar expectations as tiering storage in a Storage Area Network - we attempt to better match our specific workloads.

Service Tiers and Why

When storage is being thought about, especially if it is the first centralization effort being undertaken by the company, service tiers have to be very, very seriously considered. The terms tier or tiering are commonly used in the IT industry in general and typically all suggest different levels of something. In this context storage tiering is effectively an approach to architecting storage systems to be more closely tied to needs and expectations of the various consumers (client systems) in the environment. Just as the development environment may not be as critical as the SaaS platform infrastructure, so should be storage. If for example we have environments that are less critical and less important than others and we consciously choose to build less robust networks and use lower-performance hardware for these environments to cut costs, why would we build our storage in a way where we treat these environments equivalent to our critical production tiers?

To further extend the point, consider this example. In an environment where a critical production application is sharing storage resources with a backup application, from the perspective of the SAN both applications are equally important, but from the standpoint of the business real-time applications are vastly more important than the back-end systems, such as backups. Yet, far too often the same storage system and in ZFS terms the same storage pool is built to both support the backup component and service critical production applications. While some environments can operate this way without observe severe performance penalties on the most critical apps, other environments cannot.

It is not uncommon to have multiple very different requirements for storage. For example, the backup environment really does not very benefit much from having a lot of IOPs potential, but it will benefit greatly from maximum throughput. On the other hand a persistence layer that is part of a typical web application is very unlikely to benefit from throughput, but greatly benefits from having higher low latency high-IOP capable system. When we build to satisfy all parts of the environment we end-up having to compromise in one part of the overall design or another. A contrived example with ZFS is a typical implementation where in hopes to maximize capacity, to satisfy certain apps, an IO-constrained pool is built, as opposed to designing the pool for IOPs the pool is built for capacity, but available IOPs are low.

One of ZFS's strengths and perhaps its weakness as well is ability to cope with extremely different concurrent workloads. We can see how this is a strength, because it allows for less constrained designs and more flexibility, potentially cutting costs and get best dollar to performance ratio. However, this strength also far too often leads to monolithic designs that are not scalable, are not robust and end-up with somewhat middle-of-the-road mediocre performance that may for a period of time suffice, but as the environment grows, will very rapidly degrade and become unacceptable.

Is Tiered Storage the Right Choice

Deciding whether or not an environment should implement tiering of storage is not simple and every environment is different, so while certain reasons lead to tiering in one environment, they may further support a single-tiered configuration in another. Without understanding needs and growth of the environment making this decision with any degree of certainty is impossible. It is critical to consider how the environment is architected today and which architectures are currently not working and will be re-designed. Contrasting parts of the environment and analyzing patterns, such as requirements of the apps for capacity vs. performance are essential. One should build a chart of systems that comprise the environment and attempt to group those systems into one of a few categories. These categories again will depend on the business, but should be centered around core requirements of the application.

For example a massive OLTP system may require a lot of capacity for data in general, but may only be working actively on small datasets. In such a case breaking-up an app into two tiers may make most sense. Data that is actively being worked on will require storage that is low-latency and capable of the small high-frequency IOPs from the OLTP engine. At the same time data that is perhaps aging and may be accessed very infrequently if ever is likely going to be used for reporting purposes and will likely require a lot capacity but not IOPs. This data may be best suited to reside on capacity-optimized storage, which could perhaps be configured with a large number of relatively-slow, but high-capacity 3TB drives. On the other hand the active working set for the OLTP system may be so heavily used that placing it on all-SSD storage or Hybrid storage is the right way to go. The discussion about working set is a long one and is not adequately addressed in this document, but it also plays a significant role in the design of the system and the tiering considerations being made.

In general, consider the following to be good reasons for some level of tiering in the environment:

  • Very different application footprints, where some applications may be exclusively built around capacity, and others are extremely sensitive to latency and/or require very high IOPs.

  • Applications that combine need for small amounts of highly responsive storage for real-time data processing with low performance high-capacity storage for long-term data retention.

  • Service levels differ significantly and non-business critical components have much lower availability requirements than business-critical components.

  • Environment consists of extremely large systems that place very significant and rapidly shifting demands on storage, good examples are HPC systems in research labs and universities.

Summary

A lot could be written about tiering, and has been already. There is not one right or wrong answer and the decision will ultimately be driven by the requirements of the organization. It is however very important to keep this architectural building block in mind. One must always maintain the knowledge of this and always apply this knowledge when systems design and scaling initiatives are being undertaken. The biggest factor in making a decision to tier is how much compromise is being made to achieve performance required by some of the applications in the environment. If your requirement is for 90% of your storage to be low-performance capacity-optimized versus 10% being low-latency random-access optimized storage, you should not opt for making your entire storage system performant enough to meet requirements of the 10%, instead consider tiering. Often, achieving requirements of one capacity and performance to satisfy needs of multiple very different applications does not make sense, and targeting more specifically, diverse requirements results in more optimal storage and storage that will be easier to maintain and expand, both in terms of performance and in terms of capacity.