Replacing EMC® Isilon®, VNX and NetApp® with AWS® and Microsoft® Azure
We have customers continually asking for solutions that will allow them to migrate from their existing on-premises storage systems to the AWS and Azure clouds. We hear this often from EMC VNX and Isilon customers, and increasingly from NetApp customers as well.
The ask is usually prompted by receipt of the latest storage array maintenance bill or a more strategic decision to take actions to close existing datacenters and replace with the public cloud. In other cases, customers have many remote sites, offices, or factories where there’s simply not enough space to maintain all the data at the edge in smaller appliances. The convenience and appeal of public cloud has become the obvious answer to these problems.
Customers want to trade out hardware ownership for the cloud subscription model and get out of the hardware management business.
Assessing the Environment
One of Isilon’s primary advantages that originally drove broad adoption is its scale-out architecture, which aggregates large pools of file storage with an ability to add more disk drives and storage racks as needed to scale what are effectively monolithic storage volumes. The Isilon has been a hugely successful workhorse for many years. As that gear comes of age, customers are faced with a fork in the road – continue to pay high maintenance costs in the forms of vendor license fees, replacing failing disks and ongoing expansion with more storage arrays over time or take a different direction. In other cases, the aged equipment has reached its end of life and/or support period or it’s coming soon, so something must be done.
The options available today are pretty clear: 1) forklift upgrade and replace the on-premises storage gear with new storage gear, 2) shift storage and compute to a hyperconverged alternative, or 3) migrate to the cloud and get out of the hardware business.
Increasingly, customers are ditching not just the storage gear but the entire datacenter at a much increased pace. Interestingly, some are using their storage maintenance budget to fund migrating out of their datacenter into the cloud (and many report they have money left over after that migration is completed).
This trend started as far back as 2013 for SoftNAS®, when The Street ditched their EMC VNX gear and used the 3-year maintenance budget to fund their entire cloud migration to AWS®. You can read more about The Street’s migration here.
Like Isilon, the public cloud provides its own forms of scale-out, “bottomless” storage. The problem is that this cloud storage is not designed to be NFS or file-based but instead is block and object-based storage. The cloud providers are aware of this deficit, and they do offer some basic NFS and CIFS file storage services, which tend to be far too expensive (e.g., $3,600 per TB/year) and lack enterprise NAS features customers rely upon to protect their businesses and data and minimize cloud storage costs. These cloud file services sometimes deliver good performance, but are also prone to unpredictable performance due to their shared storage architectures and the multi-tenant access overload conditions that plague them.
How Customers Replace EMC VNX®, Isilon® and NetApp® with Buurst™ SoftNAS®
As we see in the example below, SoftNAS includes a “Lift and Shift” feature that makes data migration into the cloud as simple as point, click, configure and go. This approach is excellent for up to 100 TB of storage, where data migration timeframes are reasonable over a 1 Gbps or 10 Gbps network link.
SoftNAS Lift and Shift feature continuously syncs data from the source NFS or CIFS mount points with the SoftNAS storage volumes hosted in the cloud, as shown below. Data migration over the wire is convenient for migration projects where data is constantly changing in production. It keeps both sides in sync throughout the workload migration process, making it faster and easier to move applications and dependencies from on-premises VM’s and servers into their new cloud-based counterparts.
In other cases, there could be petabytes of data that needs to be migrated, much of that data being inactive data, archives, backups, etc. For large-scale migrations, the cloud vendors provide what we historically called “swing gear” – portable storage boxes designed to be loaded up with data, then shipped from one data center to another – in this case, it’s shipped to one of the hyperscale cloud vendor’s regional datacenters, where the data gets loaded into the customer’s account. For example, AWS provides its Snowball appliance and Microsoft provides Azure DataBox for swing gear.
If the swing gear path is chosen, the data lands in the customer account in object storage (e.g., AWS S3 or Azure Blobs). Once loaded into the cloud, the SoftNAS team assists customers by bulk loading it into appropriate SoftNAS storage pools and volumes. Sometimes customers keep the same overall volume and directory structure – other times this is viewed as an opportunity to do some badly overdue reorganization and optimization.
Why Customer Choose SoftNAS vs. Alternatives
SoftNAS provides an ideal choice for the customer who is looking to host data in the public cloud in that it delivers the most cost-effective storage management available. SoftNAS is able to deliver the lowest costs for many reasons, including:
No Storage Tax – Buurst SoftNAS does not charge for your storage capacity. You read that right. You get unlimited storage capacity and only pay for additional performance. Learn more.
Unique Storage Block Tiering – SoftNAS provides multi-level, automatic block tiering that balances peak performance on NVMe and SSD tiers, coupled with low-cost bulk storage for inactive or lazy data in HDD block storage.
Superior Storage Efficiencies – SoftNAS includes data compression and data deduplication, reducing the actual amount of cloud block storage required to host your file data.
The net result is customers save up to 80% on their cloud storage bills by leveraging SoftNAS advanced cost savings features, coupled with unlimited data capacity.
Summary and Next Steps
Isilon is a popular, premises-based file server that has served customers well for many years. VNX is end of life. NetApp 7-mode appliances are long in the tooth with end of full NetApp 7-mode support on 31 December 2020 (6 months from this posting). Customers running other traditional NAS appliances from IBM®, HP, Quantum and others are in similar situations.
Since 2013, as the company who created the Cloud NAS category, Buurst SoftNAS has helped migrate thousands of workloads and petabytes of data from on-premises NAS filers of every kind across 39 countries globally. So you can be confident that you’re in good company with the cloud data performance experts at Buurst guiding your way and migrating your data and workloads into the cloud.
There are many compelling reasons to migrate applications and workloads to the cloud, from scalability and agility to easier maintenance. But anytime IT systems or applications go down it can prove incredibly costly to the business. Downtime costs between $100,000 to $450,000 per hour, depending upon the applications affected. And these costs do not account for the political costs or damage to a company’s brand and image with its customers and partners, especially if the outage becomes publicly visible and newsworthy.
“Through 2022, at least 95% of cloud security failures will be the customer’s fault,” says Jay Heiser, research vice president at Gartner. If you want to avoid being in that group, then you need to know the pitfalls to avoid.
To that end here are seven traps that companies often fall into and what can be done to avoid them.
1. No data-protection strategy
It’s vital that your company data is safe at rest and in-transit. You need to be certain that it’s recoverable when (not if) the unexpected strikes. The cloud is no different than any other data center or IT infrastructure in that it’s built on hardware that will eventually fail. It’s managed by humans, who are prone to an occasional error, which is what typically has caused most major cloud outages over the past 5 years that I’ve seen on a large scale.
Consider the threats of data corruption, ransomware, accidental data deletion due to human error, or a buggy software update, coupled with unrecoverable failures in cloud infrastructure. If the worst should happen, you need a coherent, durable data protection strategy. Put it to the test to make sure it works.
Most native cloud file services provide limited data protection (other than replication) and no protection against corruption, deletion or ransomware. For example, if your data is stored in EFS on AWS® and files or a filesystem get deleted, corrupted or encrypted and ransomed, who are you going to call? How will you get your data back and business restored? If you call AWS Support, you may well get a nice apology, but you won’t get your data back. AWS and all the public cloud vendors provide excellent support, but they aren’t responsible for your data (you are).
As shown below, a Cloud NAS with a copy-on-write (COW) filesystem, like ZFS, does not overwrite data. In this oversimplified example, we see data blocks A – D representing the current filesystem state. These data blocks are referenced via filesystem metadata that connects a file/directory to its underlying data blocks, as shown in a. As a second step, we see a Snapshot was taken, which is simply a copy of the pointers as shown in b. This is how “previous versions” work, like the ability on a Mac to use Time Machine to roll back and recover files or an entire system to an earlier point in time.
Anytime we modify the filesystem, instead of a read/modify/write of existing data blocks, we see new blocks are added in c. And we also see block D has been modified (copied, then modified and written), and the filesystem pointers now reference block D+, along with two new blocks E1 and E2. And block B has been “deleted” by removing its filesystem pointer from the current filesystem tip, yet the actual block B continues to exist unmodified as it’s referenced by the earlier Snapshot.
Copy on write filesystems use Snapshots to support rolling back in time to before a data loss event took place. In fact, the Snapshot itself can be copied and turned into what’s termed a “Writable Clone”, which is effectively a new branch of the filesystem as it existed at the time the Snapshot was taken. A clone contains a copy of all the data block pointers, not copies of the data blocks themselves.
Enterprise Cloud NAS products use COW filesystems and then automate management of scheduled snapshots, providing hourly, daily and weekly Snapshots. Each Snapshot provides a rapid means of recovery, without rolling a backup tape or other slow recovery method that can extend an outage by many hours or days, driving the downtime costs through the roof.
With COW, snapshots, and writable clones, it’s a matter of minutes to recover and get things back online, minimizing the outage impact and costs when it matters most. Use a COW filesystem that supports snapshots and previous versions. Before selecting a filesystem, make sure you understand what data protection features it provides. If your data and workload are business-critical, ensure the filesystem will protect you when the chips are down (you may not get a second chance if your data is lost and unrecoverable).
2. No data-security strategy
It’s common practice for the data in a cloud data center to be comingled and collocated on shared devices with countless other unknown entities. Cloud vendors may promise that your data is kept separately, but regulatory concerns demand that you make certain that nobody, including the cloud vendor, can access your precious business data.
Think about access that you control (e.g., Active Directory), because basic cloud file services often fail to provide the same user authentication or granular control as traditional IT systems. The Ponemon Institute puts the average global cost of a data breach at $3.92 million. You need a multi-layered data security and access control strategy to block unauthorized access and ensure your data is safely and securely stored in encrypted form wherever it may be.
Look for NFS and CIFS solutions that provide encryption for data both at rest and in flight, along with granular access control.
3. No rapid data-recovery strategy
With storage snapshots and previous versions managed by dedicated NAS appliances, rapid recovery from data corruption, deletion or other potentially catastrophic events is possible. This is a key reason that there are billions of dollars worth of NAS applications hosting on-premises data today.
But few cloud-native storage systems provide snapshotting or offer easy rollback to previous versions, leaving you reliant on current backups. And when you have many terabytes or more of filesystem data, restoring from a backup will take many hours to days. Obviously, restores from backups are not a rapid recovery strategy – it should be the path of last resort because it’s so slow and going to extend the outage by hours to days and the losses potentially into six-figures or more.
You need flexible, instant storage snapshots and writable clones that provide rapid recovery and rollback capabilities for business-critical data and applications. Below we see previous version snapshots represented as colored folders, along with auto pruning over time. With the push of a button, an admin can clone a snapshot instantly, creating a writable clone copy of the entire filesystem that shares all the same file data blocks using a new set of cloned pointers. Changes made to the cloned filesystem do not alter the original snapshot data blocks; instead, new data blocks are written via the COW filesystem semantics, as usual, keeping your data fully protected.
Ensure your data recovery strategy includes “instant snapshots” and “writable clones” using a COW filesystem. Note that what cloud vendors typically call snapshots are actually deep copies of disks, not consistent instant snapshots, so don’t be confused as they’re two totally different capabilities.
4. No data-performance strategy
Shared, multi-tenant infrastructure often leads to unpredictable performance. We hear the horror stories of unpredictable performance from customers all the time. Customers need “sustained performance” that can be counted on to meet SLAs.
Most cloud storage services lack the facilities to tune performance, other than adding more storage capacity, along with corresponding unnecessary costs. Too many simultaneous requests, network overloads, or equipment failures can lead to latency issues and sluggish performance in the shared filesystem services offered by the cloud vendors.
Look for a layer of performance control for your file data that enables all your applications and users to get the level of responsiveness that’s expected. You should also ensure that it can readily adapt as demand and budgets grow over time.
Cloud NAS filesystem products provide the flexibility to quickly adjust the right blend of (block) storage performance, memory for caching read-intensive workloads, and network speeds required to push the data at the optimal speed. There are several available “tuning knobs” to optimize the filesystem performance to best match your workload’s evolving needs, without overprovisioning storage capacity or costs.
Look for NFS and CIFS filesystems that offer the full spectrum of performance tuning options that keep you in control of your workload’s performance over time, without breaking the bank as your data storage capacity ramps and accelerates.
5. No data-availability strategy
Hardware fails, people commit errors, and occasional outages are an unfortunate fact of life. It’s best to plan for the worst, create replicas of your most important data and establish a means to quickly switch over whenever sporadic failure comes calling.
Look for a cloud or storage vendor willing to provide an SLA guarantee that matches your business needs and supports the SLA you provide to your customers. Where necessary create a failsafe option, with a secondary storage replica to ensure your applications do not experience any outage and instead a rapid HA failover occurs instead of an outage.
In the cloud, you can get 5-9’s high availability from solutions that replicate your data across two availability zones; i.e., 5 minutes or less of unplanned downtime per year. Ask your filesystem vendor to provide a copy of their SLA and uptime guarantee to ensure it’s aligned with the SLAs your business team requires to meet its own SLA obligations.
6. No multi-cloud interoperability strategy
As many as 90% of organizations will adopt a hybrid infrastructure by 2020, according to Gartner analysts. There are plenty of positive driving forces as companies look to optimize efficiency and control costs, but you must properly assess your options and the impact on your business. Consider the ease with which you can switch vendors in the future and any code that may have to be rewritten. Cloud platforms entangle you with proprietary APIs and services, but you need to keep your data and applications multi-cloud capable to stay agile and preserve choice.
You may be delighted with your cloud platform vendor today and have no expectations of making a change, but it’s just a matter of time until something happens that causes you to need a multi-cloud capability. For example, your company acquires or merges with another business that brings a different cloud vendor to the table and you’re faced with the need to either integrate or interoperate. Be prepared as most businesses will end up in a multi-cloud mode of operation.
7. No disaster-recovery strategy
A simple mistake where a developer accidentally pushes a code drop into a public repository and forgets to remove the company’s cloud access keys from the code could be enough to compromise your data and business. It definitely happens. Sometimes the hackers who gain access are benign, other times they are destructive and delete things. In the worst case, everything in your account could be affected.
Maybe your provider will someday be hacked and lose your data and backups. You are responsible and will be held accountable, even though the cause is external. Are you prepared? How will you respond to such an unexpected DR event?
It’s critically important to keep redundant, offsite copies of everything required to fully restart your IT infrastructure in the event of a disaster or full-on hacker attack break-in.
The temptation to cut corners and keep costs down with data management is understandable, but it is dangerous, short-term thinking that could end up costing you a great deal more in the long run. Take the time to craft the right DR and backup strategy and put those processes in place, test them periodically to ensure they’re working, and you can mitigate these risks.
For example, should your cloud root account somehow get compromised, is there a fail-safe copy of your data and cloud configuration stored in a second, independent cloud (or at least a different cloud account) you can fall back on? DR is like an insurance policy – you only get it to protect against the unthinkable, which nobody expects will happen to them… until it does. Determine the right level of DR preparedness and make those investments. DR costs should not be huge in the cloud since most everything (except the data) is on-demand.
We have seen how putting the right data management plans in place ahead of an outage will make the difference between a small blip on the IT and business radars vs. a potentially lengthy outage that costs hundreds of thousands to millions of dollars – and more when we consider the intangible losses and career impacts that can arise. Most businesses that have operated their own data centers know these things, but are these same measures being implemented in the cloud?
The cloud offers us many shortcuts to quickly get operational. After all, the cloud platform vendors want your workloads running and billing hours on their cloud as soon as possible. Unfortunately, choosing naively upfront may get your workloads migrated faster and up and running on schedule, but in the long run, cost you and your company dearly.
Use the above cloud file data management strategies to avoid the 7 most common pitfalls.
Learn more about how SoftNAS Cloud NAS helps you address all 7 of these data management areas.
In today’s world, where virtually everything has been automated, our businesses produce and store untold amounts of data. A subset of that data is leveraged to create business value or at least insights that can help inform decision making. Much, if not most, of the data remains untapped and has been termed “dark data” by Gartner and others.
Gartner defines dark data as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.” It includes all data objects and types that have yet to be analyzed for any business or competitive intelligence or aid in business decision making.
Most businesses are sitting on a proverbial goldmine of dark data today…and creating even more each year as new systems, SaaS application islands, IoT devices and automation get deployed or acquired. This is in addition to a virtual archaeological dig of legacy IT systems most businesses still run on today.
The keys to future success and business growth are right there under our noses, yet few companies go out and tap this dark data and become enlightened and enriched by it.
It’s What You Do with Dark Data that Creates Business Value
For some, dark data is stored and maintained for regulatory compliance. For others, “data hoarding” is company policy…keep it around in case it’s needed for something someday. Meanwhile, it’s taking up space and driving latent costs underneath the business.
For most, dark data is a cost center today. It’s an unused asset that’s been overlooked and is for the most part simply ignored. For others who are aware of and concerned about the ongoing storage costs, dark data gets periodically pruned and purged, which reduces its costs but destroys potential future value.
With the advances of deep learning, there are now machine learning algorithms and techniques that can turn what has historically become dark data into real business insights and value.
Why Isn’t Dark Data Being Leveraged Today?
There are many challenges that keep data in the dark, including:
Nobody clearly responsible for inventorying, tracking and leveraging dark data at the company
Lack of specific business objectives that can leverage dark data
Technical barriers such as widely varying data formats, data sitting in physically and geographically disperse, incompatible systems
High costs and scarcity of data scientists who know how to acquire, process, clean and filter raw dark data into usable enlightened data
Lack of vision by management team to transform the company’s vast dark data troves into business value, perpetuating dark data’s ongoing fate at most companies.
Why Do Something with Dark Data and Why Now?
The recent surge in Artificial Intelligence and Deep Learning advances make it both convenient and cost-effective to deploy the tools required to automate the analysis of data to generate competitive advantage and new business value.
Deep learning models often entail processing vast amounts of data for proper training. Today there’s a plethora of proven machine learning inference engines, models and algorithms available to quickly generate insights and business value from data. There are also off-the-shelf cognitive services that make tapping into the power of AI/ML quick and cost-effective, including forecasting, image recognition, speech recognition and many others.
It’s now more cost-effective than ever to migrate, store and archive large datasets in the cloud; i.e., it no longer entails massive capital investments that must be justified to the CFO and executive team to create custom data warehouse projects in on-premises datacenters.
However, most of the dark data continues to lurk around the edges of the business or is squirrelled away within various data islands (e.g., Salesforce, accounting systems, e-commerce systems, security systems, warehouse systems, etc.), making it challenging and expensive to mine and leverage dark data quickly and easily to test and prove out its business value.
Getting to a Single Source of Truth
Today, business leaders recognize the enormous value cloud computing delivers. Once senior management also recognize that leveraging dark data to feed AI and machine learning should be a higher priority, this strategic decision sets the table for what comes next – creating a “single source of truth” for that data.
Fortunately, there are tools that make automating data extraction, filtering, cleansing and transporting it to a central location (usually the cloud these days) faster, easier and more cost effective.
These tools can extract dark data from virtually any data repository and format in which it is stored today globally, then filter and migrate the relevant data to where it can be centrally processed and enlightened. Once the dark data has been migrated into the cloud, it can also be compressed and archived cost-effectively for future use, regulatory compliance, etc.
Data Warehouse, Data Lake, or What?
Over the years, many attempts have been made to create the holy grail for data management and analytics – a single source of truth that houses all the company’s analytics and decision support needs. Each of these data storage and corresponding analytic approaches have demonstrated their strengths and weaknesses in practice.
Usually, the costs associated with creating data warehouses and maintaining them has turned out to be too high to be sustainable. And certainly, far too complex and expensive to also house all that dark data, too.
Fortunately, leading cloud providers are delivering solutions to these challenges. For example, Microsoft Azure provides Azure Data Lake (ADLS), a flexible data management solution that accepts data as Common Data Format (CDM). The data is stored cost-effectively using underlying Azure Blob object storage for petabyte-scale files and trillions of objects that can be efficiently queried and leveraged. ADLS provides a compelling foundation for quickly creating a single source of truth in the cloud today. And because ADLS stores data in using CDM compatible common schema, a broad range of popular tools are instantly compatible, including Power BI, Azure Machine Learning Studio, and more. And numerous Azure Cognitive Services make it easy to put dark data to use.
AWS provide a rich set of AI, ML and analytics capabilities, whereby S3 typically serves as the low-cost, highly durable data lake. Once data is extracted, filtered, transformed from its source(s) and moved into S3 buckets, it is well positioned for use by most AWS services, including AWS Forecast, Rekognition, Sagemaker, Redshift among others.
Bridging the Islands of Dark Data with the Cloud to Create Business Gold
Now that we have a cost-effective place to create a single source of truth for our data in the clouds, how can we bridge the gaps that exist between where the dark data is created and stored today with these cloud services?
Buurst recently announced a new product named Fuusion that is coming soon. As shown below, Fuusion bridges disparate data sources from existing on-prem, SaaS, legacy and new IoT/edge sources with Cloud Services.
Fuusion uses Connectors to data sources and cloud services to quickly bridge incompatible data formats, with the right levels of filtering, routing and data cleansing.
Fuusion eliminates the need, in most cases, for a data scientist specialist to manually deal with all these data sources and formats. It also eliminates the need to invest in expensive, time-consuming DevOps projects to custom develop these integrations.
Instead, a Fuusion user visually drags and drops a Fussion Template, a set of data connectors and other processing blocks, onto a web-based canvas in the Fuusion Editor and configures it to quickly create custom Fuusion data flows.
Each data flow acquires, filters, transforms and moves dark data from its present location into a suitable, centralized cloud data lake. Once the data is in the data lake, it’s available for further processing for AI/ML training, inferencing and cognitive services, analytics and/or display.
Using the Azure Data Lakes example introduced above, Fuusion connects data from one or more disparate data sources, which can be any type of structured or unstructured data, with ADLS by normalizing the data into CDM formatted data. At that point, Power BI can be used to perform advanced analytics, machine learning and dashboard display, creating business value from what was previously unleveraged dark data.
Another common use case revolves around taking time-series data and creating forecasts. For example, AWS Forecast delivers highly accurate forecasts and as an off-the-shelf cognitive service, requires no DevOps or specialized AI/ML data science skills when connected via Fuusion. Fuusion gathers up time series data from one or more Excel spreadsheets (or CSV files), for example, and transforms the data, moves it into S3, and then executes AWS Forecast jobs to generate the forecast, which is returned to the user in Excel formatted spreadsheet files. Once Fuusion is deployed and configured, users can generate any number of forecast jobs with Excel spreadsheet skills, extending the power of AI and machine learning to where much of dark data lives today, in user’s filesystems and desktops.
Fuusion provides an alternative to expensive, time-consuming custom DevOps projects that are typically required to connect dark data and edge data with powerful cloud services. It also minimizes the amount of data science required to deploy and leverage dark data using off the shelf cloud-based cognitive services and other applications.
Buurst Fuusion will bridge off-cloud and dark data with major cloud services in 2H 2020. Stay tuned for more details on Fuusion, including some demos of cloud service integrations mentioned in this post. To learn more about Fuusion and its availability or to register for the beta program, please contact the Buurst team.
According to Gartner, by 2025 80% of enterprises will shut down their traditional data centers. Today, 10% already have. We know this is true because we have helped thousands of these businesses migrate workloads and business-critical data from on-premises datacenters into the cloud since 2013. Most of those workloads have been running 24 x 7 for 5+ years. Some of them have been digitally-transformed (code for “rewritten to run natively in the cloud”).
The biggest challenge in adopting cloud isn’t the technology shift – it’s finding the right balance of cost vs. performance and availability that justifies moving to the cloud. We all have a learning curve as we migrate major workloads into the cloud. That’s to be expected as there are many choices to make – some more critical than others.
Many of our largest customers operate mission-critical, revenue-generating applications in the cloud today. Business relies on these applications and their underlying data for revenue growth, customer satisfaction, and retention. These systems cannot tolerate unplanned downtime. They must perform at expected levels consistently… even under increasingly heavy loads, unpredictable interference from noisy cloud neighbors, occasional cloud hardware failures, sporadic cloud network glitches, and other anomalies that just come with the territory of large scale datacenter operations.
In order to meet customer and business SLAs, cloud-based workloads must be carefully designed. At the core of these designs is how data will be handled. Choosing the right file service component is one of the critical decisions a cloud architect must make.
For customers to remain happy, application performance must be maintained. Easier said than done when you no longer control the IT infrastructure in the cloud…
So how does one negotiate these competing objectives around cost, performance, and availability when you no longer control the hardware or virtualization layers in your own datacenter? And how can these variables be controlled and adapted over time to keep things in balance? In a word – control. You must correctly choose where to give up control and where to maintain control over key aspects of the infrastructure stack supporting each workload.
One allure of the cloud is that it’s (supposedly) going to simplify everything into easily managed services, eliminating the worry about IT infrastructure forever. For non-critical use cases, managed services can, in fact, be a great solution. But what about when you need to control costs, performance, and availability? Unfortunately, managed services must be designed and delivered for the “masses”, which means tradeoffs and compromises must be made. And to make these managed services profitable, significant margins must be built into the pricing models to ensure the cloud provider can grow and maintain them.
In the case of public cloud shared file services like AWS® Elastic File System (EFS) and Azure NetApp® Files (ANF), performance throttling is required to prevent thousands of customer tenants from overrunning the limited resources that are actually available. To get more performance, you must purchase and maintain more storage capacity (whether you actually need that add-on storage or not). And as your storage capacity inevitably grows, so do the costs. And to make matters worse, much of that data is actually inactive most of the time, so you’re paying for data storage every month that you rarely if ever even access. And the cloud vendors have no incentive to help you reduce these excessive storage costs, which just keep going up as your data continues to grow each day.
After watching this movie play out with customers for many years and working closely with the largest to smallest businesses across 39 countries, at Buurst™ we decided to address these issues head-on. Instead of charging customers what is effectively a “storage tax” for their growing cloud storage capacity, we changed everything by offering Unlimited Capacity. That is, with Buurst SoftNAS® you can store an unlimited amount of file data in the cloud at no extra cost (aside from the underlying cloud block and object storage itself).
SoftNAS has always offered both data compression and deduplication, which when combined typically reduces cloud storage by 50% or more. Then we added automatic data tiering, which recognizes inactive and stale data, archiving it to less expensive storage transparently, saving up to an additional 67% on monthly cloud storage costs.
Just like when you managed your file storage in your own datacenter, SoftNAS keeps you in control of your data and application performance. Instead of turning control over to the cloud vendors, you maintain total control over the file storage infrastructure. This gives you the flexibility to keep costs and performance in balance over time.
To put this in perspective, without taking data compression and deduplication into account yet, look at how Buurst SoftNAS costs compare:
Buurst SoftNAS vs. NetApp ONTAP, Azure NetApp Files and AWS EFS
These monthly savings really add up. And if your data is compressible and/or contains duplicates, you will save up to 50% more on cloud storage because the data is compressed and deduplicated automatically for you.
Fortunately, customers have alternatives to choose from today:
GIVE UP CONTROL – use cloud file services like EFS or ANF, pay for both performance and capacity growth, give up control over your data or ability to deliver on SLAs consistently
KEEP CONTROL – of your data and business with Buurst SoftNAS, balance storage costs, and performance to meet your SLAs and grow more profitably.
Sometimes cloud migration projects are so complex and daunting that it’s advantageous to just take shortcuts to get everything up and running and operational as a first step. We commonly see customers choose cloud file services as an easy first stepping stone to a migration. Then these same customers proceed to the next step – optimizing costs and performance to operate the business profitably in the cloud and they contact Buurst to take back control, reduce costs, and meet SLAs.
As you contemplate how to reduce cloud operating costs while meeting the needs of the business, keep in mind that you face a pivotal decision ahead. Either keep control or give up control of your data, its costs, and performance. For some use cases, the simplicity of cloud file services is attractive and the data capacity is small enough and performance demands low enough that the convenience of files-as-a-service is the best choice. As you move business-critical workloads where costs, performance and control matter, or the datasets are large (tens to hundreds of terabytes or more), keep in mind that Buurst SoftNAS never charges you a storage tax on your data and keeps you in control of your business destiny in the cloud.
NVMe (non-volatile memory express) technology is now available as a service in the AWS cloud. Coupled with 100 Gbps networking, NVME SSDs open new frontiers of HPC and transactional workloads to run in the cloud. And because it’s available “as a service,” powerful HPC storage and compute clusters can be spun up on demand, without the capital investments, time delays and long-term commitments usually associated with High Performance Computing (HPC).
In this post, we look at how to leverage AWS i3en instances for performance-sensitive workloads that heretofore would only run on specialized, expensive hardware in on-premises datacenters. This technology enables both commercial HPC and research HPC, where capital budgets have limited an organization’s ability to leverage traditional HPC. It also bodes well for demanding AI and machine learning workloads, which can benefit from HPC storage coupled by 100 Gbps networking directly to GPU clusters.
According to AWS, “i3en instances can deliver up to 2 million random IOPS at 4 KB block sizes and up to 16 GB/s of sequential disk throughput.” These instances come outfitted with up to 768 GB memory, 96 vCPUs, 60 TB NVMe SSD, and they support up to 100 Gbps low-latency networking within cluster placement groups.
Challenges to Solve
NVMe SSDs are directly attached to Amazon EC2 instances and provide maximum IOPS and throughput, but they suffer from being pseudo-ephemeral; that is, whenever the EC2 instance is shut down and started back up, a new set of empty NVMe disks are attached, possibly on a different host. Thus, data loss results as the original NVMe devices are no longer accessible. Unfortunately, this behavior limits the applicability of these powerful instances.
Customers need a way to bring demanding enterprise SQL database workloads for SAP, OLA, data warehousing and other applications to the cloud. These business-critical workloads require both high availability (HA) and high transactional throughput, along with storage persistence and disaster recovery capabilities.
Customers also want to run commercial HPC and research HPC in the cloud, along with AI/deep learning, 3D modeling and simulation workloads.
Unfortunately, most EC2 instances top out at 10 Gbps (1.25 GB/sec) network bandwidth. Moderate HPC workloads require 5 GB/sec or more read and write throughput. Higher-end HPC workloads need 10 GB/sec or more throughput. Even the fastest class of EBS, provisioned IOPS, can’t keep up.
Mission-critical application and transactional database workloads also require non-stop operation, data persistence and high-performance storage throughput of several GB/sec or more.
NVMe solves the IOPS and throughput problems, but its non-persistence is a showstopper for most enterprise workloads.
So, the ideal solution must address NVMe data persistence, without slowing down the HPC workloads, and for mission-critical applications, must also deliver high-availability with at least a 99.9% uptime SLA.
SoftNAS Labs Objectives
SoftNAS Labs™ recently decided to put these AWS hotrod NVMe-backed instances to the test, with an eye toward solving the data persistence and HA issues, without degrading their high-performance benefits.
Solving the persistence problem paves the way for many interesting use cases to run in the cloud, including:
Commercial HPC workloads
Deep learning workloads based upon Python-based ML frameworks like TensorFlow, PyTorch, Keras, MxNet, Sonnet and others that require feeding massive amounts of data to GPU compute instances
3D modeling and simulation workloads
A secondary objective was to add high-availability with high-performance for Windows Server and SQL Server workloads, SAP HANA, OLTP and data warehousing. High Availability (HA) makes delivering an SLA for HPC workloads possible in the cloud.
SoftNAS Labs developed two solutions, one for Linux-based workloads and another for Windows Server and SQL Server use cases.
SoftNAS for HPC Linux Workloads
This solution leverages the Elastic Fabric Adapter (EFA) and AWS clustered placement groups with i3en family instances and 100 Gbps networking. SoftNAS Labs testing measured up to 15 GB/second random read and 12.2 GB/second random write throughput. We also observed more than 1 million read IOPS and 876,000 write IOPS from a Linux client, all running FIO benchmarks.
The following block diagram shows the system configuration used to attain these results.
The following block diagram shows the system configuration used to attain these results.
The NVMe-backed instance contained storage pools and volumes dedicated to HPC read and write tasks. Another SoftNAS Persistent Mirror instance leveraged SoftNAS’ patented SnapReplicate® asynchronous block replication to EBS provisioned IOPS for data persistence and DR.
In real-world HPC use cases, one would likely deploy two separate NVMe-backed instances – one dedicated to high-performance read I/O traffic and the other for HPC log writes. In our testing, we used 8 or more synchronous iSCSI data flows from a single HPC client node. It’s also possible to leverage NFS across a cluster of HPC client nodes as well, providing there are 8 or more client threads each accessing storage. Each “flow,” as it’s called in placement group networking, delivers 10 Gbps of throughput. Maximizing use of the available 100 Gbps network requires leveraging 8 to 10 or more such parallel flows.
Persistence of the NVMe SSDs runs in the background, asynchronously to the HPC job itself. Provisioned IOPS is the fastest EBS persistent storage on AWS. SoftNAS’ underlying OpenZFS filesystem uses storage snapshots once per minute to aggregate groups of I/O transactions occurring at 10 GB/second or faster across the NVMe devices. Once per minute, these snapshots are persisted to EBS using 8 parallel SnapReplicate streams, albeit trailing the near real-time NVMe HPC I/O slightly. When the HPC job settles down, the asynchronous persistence writes to EBS catch up, ensuring data recoverability in the event the NVMe instance is powered down or is required to move to a different host for maintenance patching reasons.
Here’s a sample CloudWatch screen grab taken off the SoftNAS instance after one of the FIO random write tests. We see more than 100 Gbps Network In (writes to SoftNAS) and approaching 900,000 random write IOPS. The reads (not shown here) clocked in at more than 1,000,000 IOPS (less than the 2 million IOPS AWS says the NVMe can deliver – it would take more than 100 Gbps networking to reach the full potential of the NVMe).
Here’s a sample CloudWatch screen grab taken off the SoftNAS instance after one of the FIO random write tests.
One thing that surprised us is there’s virtually no observable difference in random vs. sequential performance with NVMe. Because NVMe are comprised of high-speed memory that’s direct-attached to the system bus, we don’t see the usual storage latency differences between random seek vs. sequential workloads – it all performs at the same speed over NVMe.
The level of performance delivered over EFA networking to and from NVMe for Linux workloads is impressive – the fastest SoftNAS Labs has ever observed running in the AWS cloud – a million IOPS and 15 GB/second read performance and 876,000 write IOPS at 12.2 GB/second.
This HPC storage configuration for Linux can be used to satisfy many use cases, including:
Commercial HPC workloads
Deep learning workloads based upon Python-based ML frameworks like TensorFlow, PyTorch, Keras, MxNet, Sonnet and others that require feeding massive amounts of data to GPU compute clusters
3D modeling and simulation workloads
HPC container workloads.
SoftNAS for HPC Windows Server and SQL Server Workloads
This solution leverages the Elastic Network Adapter (ENA) and AWS clustered placement groups with i3en family and 25 Gbps networking. SoftNAS Labs testing measured up to 2.7 GB/second read and 2.9 GB/second write throughput on Windows Server running Crystal Disk benchmarks. We did not have time to benchmark SQL Server in this mode, something we plan to do later.
Unfortunately, Windows Server on AWS does not support the 100 Gbps EFA driver, so at the time of these tests, placement group networking with Windows Server was limited to 25 Gbps via ENA only.
The following block diagram shows the system architecture used to attain these results.
The following block diagram shows the system architecture used to attain these results.
In order to provide high availability and high performance, which SoftNAS Labs calls High-Performance HA (HPHA), it’s necessary to combine two SoftNAS NVMe-backed instances deployed into an iSCSI mirror configuration. The mirrors use synchronous I/O to ensure transactional integrity and high availability.
SnapReplicate uses snapshot-based block replication to persist the NVMe data to provisioned IOPS EBS (or any EBS class or S3) for DR. The DR node can be in a different zone or region, as indicated by the DR requirements. We chose provisioned IOPS to minimize persistence latency.
Windows Server supports a broad range of applications and workloads. We increasingly see SQL Server, Postgres and other SQL workloads being migrated into the cloud. It’s common to see various large-scale enterprise applications like SAP, SAP HANA and other SQL Server and Windows Server workloads that require both high-performance and high availability.
The above configuration leveraging NVMe-backed instances enables AWS to support more demanding enterprise workloads for data warehousing, OLA, and OLTP use cases. SoftNAS HPHA enables high performance, synchronous mirroring across NVMe instances with high availability and a level of data persistence and DR required by many business-critical workloads.
AWS i3en instances deliver a massive amount of punch, in terms of CPU horsepower, cache memory and up to 60 terabytes of NVMe storage. The EFA driver, coupled with clustered placement group networking, delivers high-performance 100 Gbps networking and HPC levels of IOPS and throughput. The addition of SoftNAS makes data persistence and high availability possible in order to more fully leverage the power these instances provide. This situation works well for Linux-based workloads today.
However, the lack of Elastic Fiber Adapter for full 100 Gbps networking with Windows Server is certainly a sore spot – one we hope that AWS and Microsoft teams are working to resolve.
The future for HPC in AWS looks bright. We can imagine a day when more than 100 Gbps networking becomes available, enabling customers to take full advantage of the 2 million IOPS the NVMe SSD’s remain poised to deliver.
SoftNAS for HPC solutions operate very cost-effectively on as few as a single node for workloads that do not require high availability or as few as two nodes with HA. Unlike other storage solutions that require a minimum of six (6) i3en nodes, the SoftNAS solution provides cost-effectiveness, HPC performance, high-availability and persistence with DR options across all AWS zones and regions.
SoftNAS and AWS are well-positioned today with commercially off-the-shelf products that, when combined, clear the way to move numerous HPC, Windows Server and SQL Server workloads from on-premises datacenters into the AWS cloud. And since SoftNAS is available on-demand via the AWS Marketplace, customers with these types of demanding needs are just minutes away from achieving HPC in the cloud. SoftNAS is available to assist partners and customers in quickly configuring and performance-tuning these HPC solutions.
SoftNAS Labs is the advanced R&D team within SoftNAS. SoftNAS Labs is responsible for much of the innovation and advanced capabilities SoftNAS has delivered to cloud customers since 2013.
SoftNAS builds upon the industry-leading OpenZFS filesystem on Linux. Customers can manage petabytes of file storage in the cloud. Since 2013, only SoftNAS delivers a true Unified NAS in the AWS cloud, including NFS v4.1, CIFS/SMB with Active Directory and iSCSI protocols in a single cloud NAS filer software image, launched in just minutes on-demand via the AWS Marketplace. And only SoftNAS can tier NVMe SSD’s with all forms of EBS block storage across all global regions and zones in AWS.
With SoftNAS, customers get dedicated, predictable and sustainable performance in the cloud, unlike multi-tenant, shared filesystems that suffer from noisy-neighbors and bottlenecks when you least expect them.