Breaking the Million IOPS Barrier on AWS with NVMe for HPC

Introduction

NVMe (non-volatile memory express) technology is now available as a service in the AWS cloud. Coupled with 100 Gbps networking, NVME SSDs open new frontiers of HPC and transactional workloads to run in the cloud. And because it’s available “as a service,” powerful HPC storage and compute clusters can be spun up on demand, without the capital investments, time delays and long-term commitments usually associated with High Performance Computing (HPC).

In this post, we look at how to leverage AWS i3en instances for performance-sensitive workloads that heretofore would only run on specialized, expensive hardware in on-premises datacenters. This technology enables both commercial HPC and research HPC, where capital budgets have limited an organization’s ability to leverage traditional HPC. It also bodes well for demanding AI and machine learning workloads, which can benefit from HPC storage coupled by 100 Gbps networking directly to GPU clusters.

According to AWS, “i3en instances can deliver up to 2 million random IOPS at 4 KB block sizes and up to 16 GB/s of sequential disk throughput.” These instances come outfitted with up to 768 GB memory, 96 vCPUs, 60 TB NVMe SSD, and they support up to 100 Gbps low-latency networking within cluster placement groups.

 

Challenges to Solve

NVMe SSDs are directly attached to Amazon EC2 instances and provide maximum IOPS and throughput, but they suffer from being pseudo-ephemeral; that is, whenever the EC2 instance is shut down and started back up, a new set of empty NVMe disks are attached, possibly on a different host. Thus, data loss results as the original NVMe devices are no longer accessible. Unfortunately, this behavior limits the applicability of these powerful instances.

Customers need a way to bring demanding enterprise SQL database workloads for SAP, OLA, data warehousing and other applications to the cloud. These business-critical workloads require both high availability (HA) and high transactional throughput, along with storage persistence and disaster recovery capabilities.

Customers also want to run commercial HPC and research HPC in the cloud, along with AI/deep learning, 3D modeling and simulation workloads.

Unfortunately, most EC2 instances top out at 10 Gbps (1.25 GB/sec) network bandwidth. Moderate HPC workloads require 5 GB/sec or more read and write throughput. Higher-end HPC workloads need 10 GB/sec or more throughput. Even the fastest class of EBS, provisioned IOPS, can’t keep up.

Mission-critical application and transactional database workloads also require non-stop operation, data persistence and high-performance storage throughput of several GB/sec or more.

NVMe solves the IOPS and throughput problems, but its non-persistence is a showstopper for most enterprise workloads.

So, the ideal solution must address NVMe data persistence, without slowing down the HPC workloads, and for mission-critical applications, must also deliver high-availability with at least a 99.9% uptime SLA.

 

SoftNAS Labs Objectives

SoftNAS Labs™ recently decided to put these AWS hotrod NVMe-backed instances to the test, with an eye toward solving the data persistence and HA issues, without degrading their high-performance benefits.

Solving the persistence problem paves the way for many interesting use cases to run in the cloud, including:

  • Commercial HPC workloads
  • Deep learning workloads based upon Python-based ML frameworks like TensorFlow, PyTorch, Keras, MxNet, Sonnet and others that require feeding massive amounts of data to GPU compute instances
  • 3D modeling and simulation workloads

A secondary objective was to add high-availability with high-performance for Windows Server and SQL Server workloads, SAP HANA, OLTP and data warehousing. High Availability (HA) makes delivering an SLA for HPC workloads possible in the cloud.

SoftNAS Labs developed two solutions, one for Linux-based workloads and another for Windows Server and SQL Server use cases.

 

SoftNAS for HPC Linux Workloads

This solution leverages the Elastic Fabric Adapter (EFA) and AWS clustered placement groups with i3en family instances and 100 Gbps networking. SoftNAS Labs testing measured up to 15 GB/second random read and 12.2 GB/second random write throughput. We also observed more than 1 million read IOPS and 876,000 write IOPS from a Linux client, all running FIO benchmarks.

The following block diagram shows the system configuration used to attain these results.

The following block diagram shows the system configuration used to attain these results.

The following block diagram shows the system configuration used to attain these results.

The NVMe-backed instance contained storage pools and volumes dedicated to HPC read and write tasks. Another SoftNAS Persistent Mirror instance leveraged SoftNAS’ patented SnapReplicate® asynchronous block replication to EBS provisioned IOPS for data persistence and DR.

In real-world HPC use cases, one would likely deploy two separate NVMe-backed instances – one dedicated to high-performance read I/O traffic and the other for HPC log writes. In our testing, we used 8 or more synchronous iSCSI data flows from a single HPC client node. It’s also possible to leverage NFS across a cluster of HPC client nodes as well, providing there are 8 or more client threads each accessing storage. Each “flow,” as it’s called in placement group networking, delivers 10 Gbps of throughput. Maximizing use of the available 100 Gbps network requires leveraging 8 to 10 or more such parallel flows.

Persistence of the NVMe SSDs runs in the background, asynchronously to the HPC job itself. Provisioned IOPS is the fastest EBS persistent storage on AWS. SoftNAS’ underlying OpenZFS filesystem uses storage snapshots once per minute to aggregate groups of I/O transactions occurring at 10 GB/second or faster across the NVMe devices. Once per minute, these snapshots are persisted to EBS using 8 parallel SnapReplicate streams, albeit trailing the near real-time NVMe HPC I/O slightly. When the HPC job settles down, the asynchronous persistence writes to EBS catch up, ensuring data recoverability in the event the NVMe instance is powered down or is required to move to a different host for maintenance patching reasons.

Here’s a sample CloudWatch screen grab taken off the SoftNAS instance after one of the FIO random write tests. We see more than 100 Gbps Network In (writes to SoftNAS) and approaching 900,000 random write IOPS. The reads (not shown here) clocked in at more than 1,000,000 IOPS (less than the 2 million IOPS AWS says the NVMe can deliver – it would take more than 100 Gbps networking to reach the full potential of the NVMe).

Here’s a sample CloudWatch screen grab taken off the SoftNAS instance after one of the FIO random write tests.

Here’s a sample CloudWatch screen grab taken off the SoftNAS instance after one of the FIO random write tests.

One thing that surprised us is there’s virtually no observable difference in random vs. sequential performance with NVMe. Because NVMe are comprised of high-speed memory that’s direct-attached to the system bus, we don’t see the usual storage latency differences between random seek vs. sequential workloads – it all performs at the same speed over NVMe.

The level of performance delivered over EFA networking to and from NVMe for Linux workloads is impressive – the fastest SoftNAS Labs has ever observed running in the AWS cloud – a million IOPS and 15 GB/second read performance and 876,000 write IOPS at 12.2 GB/second.

This HPC storage configuration for Linux can be used to satisfy many use cases, including:

  • Commercial HPC workloads
  • Deep learning workloads based upon Python-based ML frameworks like TensorFlow, PyTorch, Keras, MxNet, Sonnet and others that require feeding massive amounts of data to GPU compute clusters
  • 3D modeling and simulation workloads
  • HPC container workloads.

SoftNAS for HPC Windows Server and SQL Server Workloads

This solution leverages the Elastic Network Adapter (ENA) and AWS clustered placement groups with i3en family and 25 Gbps networking. SoftNAS Labs testing measured up to 2.7 GB/second read and 2.9 GB/second write throughput on Windows Server running Crystal Disk benchmarks. We did not have time to benchmark SQL Server in this mode, something we plan to do later.

Unfortunately, Windows Server on AWS does not support the 100 Gbps EFA driver, so at the time of these tests, placement group networking with Windows Server was limited to 25 Gbps via ENA only.

The following block diagram shows the system architecture used to attain these results.

The following block diagram shows the system architecture used to attain these results.

The following block diagram shows the system architecture used to attain these results.

In order to provide high availability and high performance, which SoftNAS Labs calls High-Performance HA (HPHA), it’s necessary to combine two SoftNAS NVMe-backed instances deployed into an iSCSI mirror configuration. The mirrors use synchronous I/O to ensure transactional integrity and high availability.

SnapReplicate uses snapshot-based block replication to persist the NVMe data to provisioned IOPS EBS (or any EBS class or S3) for DR. The DR node can be in a different zone or region, as indicated by the DR requirements. We chose provisioned IOPS to minimize persistence latency.

Windows Server supports a broad range of applications and workloads. We increasingly see SQL Server, Postgres and other SQL workloads being migrated into the cloud. It’s common to see various large-scale enterprise applications like SAP, SAP HANA and other SQL Server and Windows Server workloads that require both high-performance and high availability.

The above configuration leveraging NVMe-backed instances enables AWS to support more demanding enterprise workloads for data warehousing, OLA, and OLTP use cases. SoftNAS HPHA enables high performance, synchronous mirroring across NVMe instances with high availability and a level of data persistence and DR required by many business-critical workloads.

 

Conclusions

AWS i3en instances deliver a massive amount of punch, in terms of CPU horsepower, cache memory and up to 60 terabytes of NVMe storage. The EFA driver, coupled with clustered placement group networking, delivers high-performance 100 Gbps networking and HPC levels of IOPS and throughput. The addition of SoftNAS makes data persistence and high availability possible in order to more fully leverage the power these instances provide. This situation works well for Linux-based workloads today.

However, the lack of Elastic Fiber Adapter for full 100 Gbps networking with Windows Server is certainly a sore spot – one we hope that AWS and Microsoft teams are working to resolve.

The future for HPC in AWS looks bright. We can imagine a day when more than 100 Gbps networking becomes available, enabling customers to take full advantage of the 2 million IOPS the NVMe SSD’s remain poised to deliver.

SoftNAS for HPC solutions operate very cost-effectively on as few as a single node for workloads that do not require high availability or as few as two nodes with HA. Unlike other storage solutions that require a minimum of six (6) i3en nodes, the SoftNAS solution provides cost-effectiveness, HPC performance, high-availability and persistence with DR options across all AWS zones and regions.

SoftNAS and AWS are well-positioned today with commercially off-the-shelf products that, when combined, clear the way to move numerous HPC, Windows Server and SQL Server workloads from on-premises datacenters into the AWS cloud. And since SoftNAS is available on-demand via the AWS Marketplace, customers with these types of demanding needs are just minutes away from achieving HPC in the cloud. SoftNAS is available to assist partners and customers in quickly configuring and performance-tuning these HPC solutions.

About SoftNAS®

SoftNAS Labs is the advanced R&D team within SoftNAS. SoftNAS Labs is responsible for much of the innovation and advanced capabilities SoftNAS has delivered to cloud customers since 2013.

SoftNAS builds upon the industry-leading OpenZFS filesystem on Linux. Customers can manage petabytes of file storage in the cloud. Since 2013, only SoftNAS delivers a true Unified NAS in the AWS cloud, including NFS v4.1, CIFS/SMB with Active Directory and iSCSI protocols in a single cloud NAS filer software image, launched in just minutes on-demand via the AWS Marketplace. And only SoftNAS can tier NVMe SSD’s with all forms of EBS block storage across all global regions and zones in AWS.

With SoftNAS, customers get dedicated, predictable and sustainable performance in the cloud, unlike multi-tenant, shared filesystems that suffer from noisy-neighbors and bottlenecks when you least expect them.

To learn more or discuss your HPC and SQL Server migration needs, please contact SoftNAS or an authorized partner.

How to Maintain Control of Your Core in the Cloud

For the past 7 years, SoftNAS has helped customers in 35 countries globally to successfully migrate thousands of applications and petabytes of data out of on-premises data centers into the cloud. Over that time, we’ve witnessed a major shift in the types of applications and the organizations involved.

The move to the cloud started with simpler, low risk apps and data until companies became comfortable with cloud technologies. Today, we see companies deploying their core business and mission-critical applications to the cloud, along with everything else as they evacuate their data centers and colos at a breakneck pace.

At the same time, the mix of players has also shifted from early adopters and DevOps to a blend that includes mainstream IT.

The major cloud platforms make it increasingly easy to leverage cloud services, whether building a new app, modernizing apps or migrating and rehosting the thousands of existing apps large enterprises run today.

Whatever cloud app strategy is taken, one of the critical business and management decisions is where to maintain control of the infrastructure and where to turn control over to the platform vendor or service provider to handle everything, effectively outsourcing those components, apps and data.

So how can we approach these critical decisions to either maintain control or outsource to others when migrating to the cloud? This question is especially important to carefully consider as we move our most precious, strategic, and critical data and application business assets into the cloud.

One approach is to start by determining whether the business, applications, data and underlying infrastructure are “core” vs. “context”, a distinction popularized by Geoffrey Moore in Dealing with Darwin.

He describes Core and Context as a distinction that separates the few activities that a company does that create true differentiation from the customer viewpoint (CORE) from everything else that a company needs to do to stay in business (CONTEXT).

Core elements of a business are the strategic areas and assets that create differentiation and drive value and revenue growth, including innovation initiatives.

Context refers to the necessary aspects of the business that are required to “keep the lights on”, operate smoothly, meet regulatory and security requirements and run the business day-to-day; e.g., email should be outsourced unless you are in the email hosting business (in which case it’s core).

It’s important to maintain direct control of the core elements of the business, focusing employees and our best and brightest talents on these areas. In the cloud, core elements include innovation, business-critical and revenue-generating applications and data, which remain central to the company’s future.

Certain applications and data that comprise business context can and should be outsourced to others to manage. These areas remain important as the business cannot operate without them, but they do not warrant our employees’ constant attention and time in comparison to the core areas.

The core demands the highest performance levels to ensure applications run fast and keep customers and users happy. It also requires the ability to maintain SLAs around high availability, RTO and RPO objectives to meet contractual obligations. Core demands the flexibility and agility to quickly and readily adapt as new business demands, opportunities and competitive threats emerge.

Many of these same characteristics are also important to business context areas as well, but not as critical as the context that can simply be moved around from one outsourced vendor to another as needed.

Increasingly, we see the business-critical, core applications and data migrating into the cloud. These customers demand control of their core business apps and data in the cloud, as they did on-premises. They are accustomed to managing key infrastructure components, like the network attached storage (NAS) that hosts the company’s core data assets and powers the core applications. We see customers choose a dedicated Cloud NAS that keeps them in control of their core in the cloud.

Example core apps include revenue-generating e-discovery, healthcare imaging, 3D seismic oil and gas exploration, financial planning, loan processing, video rental and more. The most common theme we see across these apps is that they drive core subscription-based SaaS business revenue. Increasingly, we see both file and database data being migrated and hosted atop of the Cloud NAS, especially SQL Server.

For these core business use cases, maintaining control over the data and the cloud storage is required to meet performance and availability SLAs, security and regulatory requirements, and to achieve the flexibility and agility to quickly adapt and grow revenues. The dedicated Cloud NAS meets the core business requirements in the cloud, as it has done on-premises for years.

We also see many less critical business context data being outsourced and stored in cloud file services such as Azure Files and AWS EFS. In other cases, the Cloud NAS abilities to handle both core and context use cases is appealing. For example, leveraging both SSD for performance and object storage for lower cost bulk storage and archival with unified NFS and CIFS/SMB makes the Cloud NAS more attractive in certain cases.

There are certainly other factors to consider when choosing where to draw the lines between control and outsourcing applications, infrastructure and data in the cloud.

Ultimately, understanding which applications and data are core vs context to the business can help architects and management frame the choices for each use case and business situation, applying the right set of tools for each of these jobs to be done in the cloud.

 

 

Choosing the Right Type of AWS Storage for your Use-Case: Object Storage

When choosing data storage, what do you look for? AWS offers several storage types and options, and each is better suited to a certain purpose than the others. For instance, if your business only needs to store data for compliance and with little need for access, Amazon S3 volumes are a good bet. For enterprise applications, Amazon EBS SSD-backed volumes offer a provisioned IOPS option to meet the performance requirements. And then there are concerns about the cost. Savings in cost usually come at the price of performance. However, the array of options from the AWS platform for storage means there usually is a type that achieves the balance of performance and cost that your business needs. In this series of posts, we are going to look at the defining features of each AWS storage type. By the end, you should be able to tell which type of AWS storage sounds like the right fit for your business’ storage requirements. This post focuses on the features of AWS Object storage, or S3 storage. You may also read our post that explains all about AWS block storage.

Amazon S3 storage

Amazon Object storage has been designed to be the most durable storage layer, with all offerings stated to provide 99.999999999%, or 11 nines of durability of objects over a given year. This durability equates to an average annual expected loss of 0.000000001% of objects, or, said more practically, a loss of a single object once every 10,000 years. Given the advantages that Object storage brings to the table, why wouldn’t you want to use it for every scenario that includes data? This is a question put to solution architects at SoftNAS almost every day. Object storage excels in durability but its design makes it not so suitable for some use cases. When thinking about utilizing Object storage the questions you need to have answers for are:
    1. What is the Data life cycle? 2. What is the Data access frequency? 3. How latency-aware is your Application? 4. What are the service limitations?
The hallmark of S3 storage has always been high throughput – with high latency. But AWS has refined its offerings by adding several S3 storage classes that address different needs. These include:
    • AWS S3 Standard • S3 Intelligent Tiering • S3 Standard IA (Infrequent Access) • S3 One Zone IA • S3 Glacier • S3 Glacier Deep Archive
These types are listed in order of increasing latency and decreasing cost/GB. The access time across all of these S3 storage classes ranges from milliseconds to hours. How frequently you will need to access your data dictates the type of S3 storage you should choose for your backend. The cost for S3 tiers is determined not only by the amount of storage but also the access to the storage. The billing incorporates storage used, network data transferred in, network data transferred out, data retrieval and the number of data requests (PUT, GET, DELETE). Workloads with random read and write operations, low latency and high IOPS requirements are not suitable for S3 storage. Use-cases/workloads that are not latency-sensitive and require high throughput are good candidates. AWS S3 is object storage – so you must remember that all data will be stored as objects in the native format, with no hierarchies as there are when using a file system. But then, objects may be stored across several machines, and can be accessed from anywhere. Read our post on all about block storage here, it also includes tips on designing your AWS storage for optimum performance. Need More Help or Information? Even with all the above information, identifying the right data storage type, instance sizes and setting up custom architectures to suit your business performance requirements can be tricky. SoftNAS has assisted thousands of businesses with their AWS VPC configurations, and our in-house experts are available to answer queries and provide guidance free of charge. Request a complimentary professional consultation.

Choosing the Right Instance Type and Instance Size for AWS and Azure

In this post, we’re sharing an easy way to determine the best instance type and appropriate instance size to use for specific use cases in the AWS and Azure clouds.

To help you decide, there are some considerations to keep in mind. Let’s go through each of these determining factors in depth.

Decision Point 1 – What use case are you trying to address?

  • A. Migration to the Cloud
    • Migrating existing applications into the cloud should be neither complex, expensive, time consuming, resource intensive, nor force you to rewrite your application to run it in the public cloud.

      If your existing applications access storage using CIFS/SMB, NFS, AFP or iSCSI storage protocols, then you will need to choose a NAS filer solution that will allow your applications to access cloud storage (block or object) using the same protocols it already does.

  • B. SaaS-Enabled Applications
    • For revenue-generating SaaS applications, high performance, maximum uptime and strong security with access control are critical business requirements.

      Running your SaaS apps in a multi-tenant, public cloud environment can be challenging while simultaneously fulfilling these requirements. An enterprise-grade cloud NAS filer may help you cope with these challenges, even in a public cloud environment. A good NAS solution provider will assure no downtime, high-availability and high levels of performance, strong security with integration to industry standard access control and make it easier to SaaS-enable apps.

  • C. File Server Consolidation
    • Over time, end users and business applications create more and more data – usually unstructured – and rather than waiting for the IT department, these users install file servers wherever they can find room to put them, close to their locations. At the same time, businesses either get acquired or acquire other companies, inheriting all their file servers in the process. Ultimately, it’s the IT department that must manage this “server sprawl” when dealing with for OS and software patches, hardware upkeep and maintenance, and security. With limited IT staff and resources, the task becomes impossible. The best long-term solution is using the cloud, of course, and a NAS filer to migrate files to the cloud.

      This strategy allows for scalable storage that is accessed the same way by users as they have always accessed their files on the local files servers.

  • D. Legacy NAS Replacement
    • With a limited IT staff and budget, it’s impractical to keep investing in legacy NAS systems and purchase more and more storage to keep pace with the rapid growth of data. Instead, investment in enterprise-grade cloud NAS can help businesses avoid burdening their IT staff with maintenance, support and upkeep, and pass those responsibilities on to a cloud platform provider. Businesses also gain the advantages of dynamic storage scalability to keep pace with data growth, and the flexibility to map performance and cost to their specific needs.
  • E. Backup/DR/Archive in the Cloud
    • Use tools to replicate and backup your data from your VMware datacenter to the AWS, Azure public clouds. Eliminate physical backup tapes by archiving data in inexpensive S3 storage or to cold storage like AWS Glacier for long-term storage. For stringent Recovery Point Objectives, cloud NAS can serve as an on-premises backup or primary storage target for local area network (LAN) connected backups as well.

      As a business’ data grows, the backup window can become unmanageable and tie up precious network resources during business hours. Cloud NAS with local disk-based caching reduces the backup window by streaming data in the background for better WAN optimization.

Decision Point 2 – What Cloud Platform do you want to use?

No matter which cloud provider is selected, there are some basic infrastructure details to keep in mind. The basic requirements are:

  • Number of CPUs
  • Size of RAM
  • Network performance
  • A. AWS
    • Standard: r5.xlarge is a good starting point in regard to memory and CPU resources. This category is suited to handle processing and caching with minimal requirements for network bandwidth. It comprises 4 vCPU, 16 GiB RAM, 1GbE network.

      Medium: r5.2xlarge is a good choice for workloads that are read-intensive, and will benefit from the larger memory-based read cache for this category. The additional CPU will also provide better performance when deduplication, encryption, compression and/or RAID is enabled. Composition: 8 vCPU, 32 GiB RAM, 10GbE network.

      High-end: r5.24xlarge can be used for workloads that require a very high-speed network connection due to the amount of data transferred over a network connection. In addition to the very high-speed network, this level of instance gives you a lot more storage, CPU and memory capacity. Composition: 96 vCPU, 758 GiB RAM, 25GbE network.

  • B. Azure
    • Dsv3-series support premium storage and are the latest, hyper-threaded general-purpose generation running on both the 2.4 GHz Intel Xeon® E5-2673 v3 (Haswell) and the 2.3 GHz Intel Xeon® E5-2673 v4 (Broadwell) processor. With the Intel Turbo Boost Technology 2.0, the Dsv3 can achieve up to 3.5 gigahertz (GHz). The Dsv3-series sizes offer a combination of vCPU, memory and temporary storage best suited for most production workloads.


    • Standard: D4s Standard v3 4 vCPU, 16 GiB RAM with moderate network bandwidth
    • Medium: D8s Standard v3 8 vCPU, 32 GiB RAM with high network bandwidth
    • High-end: D64s Standard v3 64 vCPU, 256 GiB RAM with extremely high network bandwidth

Decision Point 3 – What type of storage is needed?
Both AWS and Azure offer block as well as object storage. Block storage is normally used with file systems while object storage addresses the need to store “unstructured” data like music, images, video, backup files, database dumps and log files. Selecting the right type of storage to use also influences how well an AWS Instance or Azure VM will perform.

Other resources:

About AWS S3 object storage: https://aws.amazon.com/s3/

About AWS EBS block storage: https://aws.amazon.com/ebs/

Types of Azure storage: https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction

 

Decision Point 4 – Need a “No Storage Downtime Guarantee”?

No matter which support platform is used, look for a filer that offers High Availability. A robust set of HA capabilities protects against data center, availability zone, server, network and storage subsystem failures to keep the business running without downtime. HA monitors all critical storage components, ensuring they remain operational. In case of an unrecoverable failure in a system component, another storage controller detects the problem and automatically takes over, ensuring no downtime or business impacts occur. So, companies are protected from lost revenue when access to their data resources and critical business applications would otherwise be disrupted.

If you’re looking for more personal guidance or have any technical questions, get in touch with our team of cloud experts who have done thousands of VPC and VNet configurations across AWS and Azure. Even if you are not considering SoftNAS, our free professional services reps will be happy to answer your questions.

Learn more about free professional services, environment setup and cloud consultations.

The Secret to Maximizing Returns on Auto-Optimized Tiered Storage

Intelligent tiering in storage can save money, but you can additionally save up to 75% if you opt for dedupe and data compression first

Businesses deal with large volumes of data every day and continue to add to this data at a rate that’s often difficult to keep up with. Data management is a continuous challenge, and data storage is an exponential expense. While the cost of storage/GB may not be significant, it adds up quickly, over days and months, and becomes a significant chunk of ongoing expenses.

More often than not, it is not feasible to delete or erase old data. Data must be stored for various reasons such as legal compliance, building databases, machine learning, or simply because it may be needed later. But a large portion of data often goes untouched for months at a time, with no need for access, yet continues to rack up the storage disk bills.

Cutting costs with Automated Tiered Storage

As a solution to this problem faced by most business owners, many storage providers and NAS filers offer auto tiering for storage. With automated tiering, data is stored across various levels of disks to save on storage costs. This tiering means data that’s accessed less frequently is stored on disks with lower performance and higher latency—disks that are much cheaper, usually 50-60% cheaper than high performance disks.

It is often difficult to identify and isolate data that is less likely to be accessed, so policies are set in place to identify and shift data automatically. For instance, you may set the threshold at 6 weeks. Then, once 6 weeks have passed without accessing a certain block of data, that block is automatically moved down to a lower tier, where it is not as expensive to continue to store it.

So intelligent tiering of data helps reduce storage costs significantly without really impacting day-to-day operations.

The Big Guns: Dedupe and Data Compression

While tiered storage helps you save on storage costs by optimizing the storage location, the expense is still proportionate to the amount of data. After all, you do pay for storage/GB. But before you even store data, have you considered, is this data streamlined? Am I saving data that can be pared down?

Deduplication

Unnecessarily bulky data is more common that you’d expect. Every time an old file or project is pulled out from storage for updates, or to ramp up, or to make any changes, a new file is saved. So even if changes are made to only 1 or 2 MB of data, a new copy of the entire 4TB file is made and saved. Now imagine this being done with several files each day. With this replication happening over and over again, the total amount of data quickly multiplies, occupying more storage, spiking storage costs, and even affecting IOPs. This is where inline deduplication helps.

With inline deduplication, files are compared block by block for redundancies, which are then eliminated. Instead, a reference count of the copies is saved. In most cases, data is reduced by 20-30% by making inline dedupe a part of the storage efficiency process delivered by a NAS filer.

deduplication of data

Data compression

Reducing the number of bits needed to represent the data through compression is a simple process and can be highly effective – data can be reduced by 50-75%. The extent of compression depends on the nature of the data and how compressed it is at the outset. For instance, an mp4 file is already a highly compressed format. But, in our experience, data usually offers good opportunities for reduction through compression.

By compressing data, the amount of storage space needed is reduced, and the costs associated with storage come down too.

data storage compression

When we combine the effect of the deduplication and compression, we find that customers reap savings of up to 90% ! If this new streamlined data is now stored using automated tiering, the savings are amplified because

1. The amount of data to be stored is reduced, thus saving on storage capacity needed across ‘hot’ and ‘cool’ tiers
2. Input/Output is reduced, leading to better performance

Data deduplication and compression explained in a use case

Let’s say we have 1 TB of actual data to store. On average, cloud SSD storage costs $0.10 per GB/month, so $100 per TB/ month.

If the data can be reduced by 80% using deduplication and compression—which is likely— effective cost per TB is just 20% of the original projection, or $20 per TB/month. Now add in auto-tiering, and that cuts the cost in half again by using a combination of SSD and lower-cost HDD, and you have a $10 /TB cost basis.

If we estimate your storage needs grow to 10 TB over time, you will pay $100 per month – that’s the amount you would have been paying for basic file storage services for 1TB of data before dedupe and data compression.

The net effect of these combined storage efficiency capabilities delivered by SoftNAS, for example, is to reduce the effective cost per GB from $0.10 per GB/month to $0.02 per GB/month by combining tiering, compression and deduplication – without sacrificing performance. With the rapidly-increasing amount of data that must be managed, who doesn’t want to cut their cloud storage bill by 500%?

Auto-Optimized Tiered Storage with SoftNAS

SoftNAS offers SmartTiers, auto-tiered storage with the added advantage of flexible operations. After deduplication and compression, data is stored in tiers – with tiering that you can configure and control, according to policies to suit your usage patterns, and optimize further as your usage evolves. Our goal is to achieve the price/performance equation that suits your business, so even when data stored away in a low-cost tier is accessed, only the particular block accessed migrates up to the hot tier, not the entire file. With a user-friendly interface, you can continue to manage the policies and thresholds, and the capacity of the tiers, and the rest of the cost-saving optimizations happen automatically and transparently.

Want to know what kind of savings you can achieve with SoftNAS SmartTiers? Try our storage cost-savings calculator.

Cloudberry Backup – Affordable & Recommended Cloud Backup Service on Azure & AWS

Let me tell you about a CIO I knew from my days as a consultant. He was even-keeled most of the time and could handle just about anything that was thrown at him. There was just the one time I saw him lose control – when someone told him the data backups had failed the previous night. He got so flustered, it was as if his career was flashing before him, and the ship might sink and there were no remaining lifeboats.

If You Don’t Backup Data, What Are the Risks?

Backing up one’s data is a problem as old as computing itself. We’ve all experienced data loss at some point, along with the pain, time and costs associated with recovering from the impacts caused in our personal or business lives. We get a backup to avoid these problems, an insurance you hope you never have to use, but, as Murphy’s Law goes, if anything can go wrong, it will.

Data storage systems include various forms of redundancy and the cloud is no exception. Though there are multiple levels of data protection within cloud block and object storage subsystems, no amount of protection can cure all potential ills. Sometimes, the only cure is to recover from a backup copy that has not been corrupted, deleted or otherwise tampered with.

SoftNAS provides additional layers of data protection atop cloud block and object storage, including storage snapshots, checksum data integrity verification on each data block, block replication to other nodes for high availability and file replication to a disaster recovery node. But even storage snapshots rely upon the underlying cloud storage block and object storage, which can and does fail occasionally.

These cloud native storage systems tout anywhere from 99.9% up to 11 nines of data durability. What does this really mean? It means there’s a non-zero probability that your data could be lost – it’s never 100%. So, when things do go wrong, you’d do best to have at least one viable backup copy. Otherwise, in addition to recovering from the data loss event, you risk losing your job too.

Why Companies Must Have a Data Backup

Let me illustrate this through an in-house experience.

In 2013, when SoftNAS was a fledgling startup, we had to make every dollar count and it was hard to justify paying for backup software or the storage it requires.

Back then, we ran QuickBooks for accounting. We also had a build server running Jenkins (still do), domain controllers and many other development and test VMs running atop of VMware in our R&D lab. However, it was going to cost about $10,000 to license Veeam’s backup software and it just wasn’t a high enough priority to allocate the funds, so we skimped on our backups. Then, over one weekend, we upgraded our VSAN cluster. Unfortunately, something went awry and we lost the entire VSAN cluster along with all our VMs and data. In addition, our makeshift backup strategy had not been working as expected and we hadn’t been paying close attention to it, so, in effect, we had no backup.

I describe the way we felt at the time as the “downtime tunnel”. It’s when your vision narrows and all you can see is the hole that you’re trying to dig yourself out of, and you’re overcome by the dread of having to give hourly updates to your boss, and their boss. It’s not a position you want to be in.

This is how we scrambled out of that hole. Fortunately, our accountant had a copy of the QuickBooks file, albeit one that was about 5 months old. And thankfully we still had an old-fashioned hardware-based Windows domain controller. So we didn’t lose our Windows domain. We had to painstakingly recreate our entire lab environment, along with rebuilding a new QuickBooks file by entering all the past 5 months of transactions, and recreate our Jenkins build server. After many weeks of painstaking recovery, we managed to put Humpty Dumpty back together again.

Lessons from Our Data Loss

We learned the hard way that proper data backups are much less expensive than the alternatives. The week after the data loss occurred, I placed the order for Veeam Backup and Recovery. Our R&D lab has been fully backed up since that day. Our Jenkins build server is now also versioned and safely tucked away in a Git repository so it’s quickly recoverable.

Of course, since then we have also outsourced accounting and no longer require QuickBooks, but with a significantly larger R&D operation now we simply cannot afford another such event with no backups ever again. The backup software is the best $10K we’ve ever invested in our R&D lab. The value of this protection outstrips the cost of data loss any day.

Backup as a Service

Fortunately, there are some great options available today to back up your data to the cloud, too. And they cost less to acquire and operate than you may realize. For example, SoftNAS has tested and certified the CloudBerry Backup product for use with SoftNAS. CloudBerry Backup (CBB) is a cloud backup solution available for both Linux and Windows. We tested the CloudBerry Backup for Linux, Ultimate Edition, which installs and runs directly on SoftNAS. It can run on any SoftNAS Linux-based virtual appliance, atop of AWS, Azure and VMware. We have customers who prefer to run CBB on Windows and perform the backups over a CIFS share. Did I forget to mention this cloud backup solution is affordable at just $150, and not $10K?

Here’s a block diagram of one example configuration below. CBB performs full and incremental file backups from the SoftNAS ZFS filesystems and stores the data into low-cost, highly-durable object storage – S3 on AWS, and Azure blobs on Azure.

CBB supports a broad range of backup repositories, so you can choose to back up to one or more targets, within the same cloud or across different clouds as needed for additional redundancy. It is even possible to back up your SoftNAS pool data deployed in Azure to AWS, and vice versa. Note that we generally recommend creating a VPC-to-S3 or VNET-to-Blob service endpoint in your respective public cloud architecture to optimize network storage traffic and speed up backup timeframes.

To reduce the costs of backup storage even further, you can define lifecycle policies within the Cloudberry UI that move the backups from object storage into archive storage. For example, on AWS, the initial backup is stored on S3, then a lifecycle policy (managed right in CBB) kicks in and moves the data out of S3 and into Glacier archive storage. This reduces the backup data costs to around $4/TB (or less in volume) per month. You can optionally add a Glacier Deep Archive policy and reduce storage costs even further down to $1 per TB/month. There is also an option to use AWS S3 Infrequent Access Storage.

There are similar capabilities available on Microsoft Azure that can be used drive your data backup costs down to affordable levels. Bear in mind the current version of Cloudberry for Linux has no native Azure Blob lifecycle management integration. Those functions need to be performed via the Azure Portal.

Personally, I prefer to keep the latest version in S3 or Azure hot blob storage for immediate access and faster recovery, along with several archived copies for posterity. In some industries, you may have regulatory or contractual obligations to keep archive data much longer than with a typical archival policy.

Today, we also use CBB to back up our R&D lab’s Veeam backup repositories into the cloud as an additional DR measure. We use CBB for this because there are no excessive I/O costs when backing up into the cloud (Veeam performs a lot of synthetic merge and other I/O, which drives up I/O costs based upon our testing).

In my book, there’s no excuse for not having file level backups of every piece of important business data, given the costs and business impacts of the alternatives: downtime, lost time, overtime, stressful calls with the bosses, lost productivity, lost revenue, lost customers, brand and reputation impacts, and sometimes, lost jobs, lost promotion opportunities – it’s just too painful to consider what having no backup can devolve into.

To summarize, there are 5 levels of data protection available to secure your SoftNAS deployment:

1. ZFS scheduled snapshots – “point-in-time” meta-data recovery points on a per-volume basis
2. EBS / VM snapshots – snapshots of the Block Disks used in your SoftNAS pool
3. HA replicas – block replicated mirror copies updated once per minute
4. DR replica – file replica kept in a different region, just in case something catastrophic happens in your primary cloud datacenter
5. File System backups – Cloudberry or equivalent file-level backups to Blob or s3.

So, whether you choose to use CloudBerry Backup, Veeam®, native Cloud backup (ex. Azure Backup) or other vendor backup solutions, do yourself a big favor. Use *something* to ensure your data is always fully backed up, at the file level, and always recoverable no matter what shenanigans Murphy comes up with next. Trust me, you’ll be glad you did!

To learn more and get started backing up your SoftNAS data today, download the PDF to get all the details on how to use CBB with SoftNAS.

Disclaimer: SoftNAS is not affiliated in any way with CloudBerry Lab. As a CloudBerry customer, we trust our business’ data to CloudBerry. We also trust our VMware Lab and its data to Veeam. As a cloud NAS vendor, we have tested with and certify CloudBerry Backup as compatible with SoftNAS products. Your mileage may vary.

This post is authored by Rick Braddy, co-founder and CTO at SoftNAS. Rick has over 40 years of IT industry experience and contributed directly to formation of the cloud NAS market.