The following is a recording and full transcript from the webinar, “Using AWS Disaster Recovery to Close Your DR Datacenter”. You can download the full slide deck on Slideshare.
Maintaining physical Disaster Recovery (DR) datacenters grow more cost-prohibitive each year. By moving your DR data center to the AWS cloud, you enable faster disaster recovery and greater resiliency without the cost of a second physical data center. In this webinar, we covered: -Architecting “pilot light” to “hot standby” DR environment -Multi AWS Availability Zone DR strategies -What cloud DR offers that on-premises options can’t -Lessons Learned from DR implementations on AWS -Demo: Building a hot standby DR environment.
Full Transcript: Using AWS Disaster Recovery to Close Your DR Datacenter
Taran Soodan: Good morning, good afternoon, and good evening everyone. Welcome to a SoftNAS webinar today on how to use AWS for disaster recovery and close your DR data center.
In today’s webinar, we’ll be discussing how you can use AWS to manage your disaster recovery. We will give a brief overview of how it works, some of the benefits of using AWS, and an overview of some of the architectures that you can build to give you a better sense of just how easy it is to manage your DR with AWS.
Before we begin today’s webinar, we just want to confirm a couple of housekeeping items out there. Webinar audio can be done through either your computer speakers or through the telephone.
By clicking the telephone option, you’re able to go in and dial-in using your cell phone, or desk phone, or whatever phones you may have available.
We will also be answering questions at the end of the webinar. For those of you who are currently in attendance if you have any questions that may pop up during the webinar post them in the questions pane and we’ll go ahead and answer those questions at the end here.
Finally, the session is being recorded. For those of you who want to share this recording with some of your colleagues or you just want to watch it on-demand, we’ll go ahead and send you a link to the recording of the webinar and a link to the slides shortly after today’s webinar is over.
Also, to thank all of you for joining us today, we are offering $100 AWS credits. Later on at the end of the webinar, we are going to give you a link which you can click and go to earn a free $100 AWS credit as thanks for attending today’s webinar.
On the agenda today, we’ll be talking about AWS’s disaster recovery.
We’ll give you a brief overview of how it works and some of the benefits of using AWS for your disaster recovery.
We are going to demo how to actually build a Hot Standby DR environment on AWS. We will also give a brief overview of some DR architectures. We’ll talk a bit about HA on AWS. The main reason is, the more prepared you are for a disaster the better it will work out for you.
Finally, at the end of the webinar, we will be doing a Q&A. Again, if you have any questions during today’s webinar, please post them in the questions pane here on GoToWebinar and we’ll answer your questions at the end.
With that, let’s go ahead and get this party started. Briefly talking a bit about AWS’s DR, we want to knock out some terminology here in the beginning. There are four terms that you’re going to be hearing throughout this webinar – business continuity, disaster recovery, recovery point objectives or RPO, and recovery time objectives, otherwise known as RTO.
Business continuity is basically just ensuring that your organization’s mission-critical business functions are continuing to operate or they recover pretty quickly from serious incidents.
Disaster recovery is all about preparing for and recovering from a disaster so any event that has a negative impact on your business. Things like hardware failures; software failures, power outages, physical damages to your buildings like fire, flooding, hurricanes, or even human error — disaster recovery is all about planning for those incidents.
The recovery point objective is the acceptable amount of data loss measured in time. For example, if your disaster hits at 12:00 PM, noon, and your RPO is one hour, basically, your system should recover all the data that was in the system before 11:00 AM, so your data loss will only span from 11:00 AM to 12:00 PM.
Finally on to the recovery time objective. That’s basically the time it takes after a disruption to restore your business processes back to the agreed upon service levels. For example, if your disaster occurs at noon and your RTO is eight hours, you should be back up and running by no later than 8:00 PM.
What we want to stress during today’s webinar is we’re not saying that you need to shut down all of your datacenters and migrate them to AWS. All we are saying today is that you can keep your primary datacenter. That DR data center that you’re currently paying for, we’re saying you can close that down and migrate those workloads to AWS.
Here you can see on the far left, we have our traditional DR architecture. You have your primary data center, and you have your DR data center.
With the DR data center, there’s replication between the primary and the DR so that it can recover as soon as some kind of disaster happens in the primary datacenter – power outage, some kind of hardware failure, a fire, or flood. This way, your users are still able to be up and running without too much of an impact.
With your traditional DR, you have, at a minimum, the infrastructure that’s required to support the duplicate environment. These are things like physical location, including the power and the cooling, security to ensure the place is protected, making sure that you have enough physical space, procuring storage. And enough server capacity to run all those missing-critical services including user-authentication, DNS monitoring and alerting.
What you’re seeing here on the far right with the AWS DR, what we’re saying is you have your main data center but you can set up replication to AWS using a couple of their services that we’ve listed out here so S3, Route 53, EC2, and SoftNAS.
You can use those to create your cloud-based disaster recovery. You’ll get all the benefits of your current DR data center but without having to maintain any additional hardware or have to worry about overprovisioning your current data center.
Here, we’ll give a brief overview comparing a DR data center to AWS. With your DR data center here on the left, there is a high cost to build it. You are responsible for storage, power, networking, internet connection. There is a lot of capital expenditures involved in maintaining that DR data center.
There’s also a high cost of — storage is expensive, backup tools are expensive. Retrieval tools are also expensive as well, and backup can take time. It often takes weeks to add in more capacity because planning, procurement, and deployment just take time with your DR data centers.
It’s also challenging to verify your DR plans. Testing for DR on-site is very time-consuming and it takes a lot of effort to make sure that it’s all working correctly.
Here on the far right, we have the benefits of using AWS for your DR. There is not a lot of capital expenditures by going to AWS. The beauty of using AWS to manage DR is it’s all on-demand so you’re only going to be paying for what you use.
There’s also a consistent experience across the AWS environments. AWS is highly durable. It’s highly available so there’s a lot of protection in making sure that your disaster recovery is going to be up and running to go.
You can also automate your recovery, as well, and we’ll talk a bit about that later on during today’s webinar. Finally, you can even set up disaster recovery per application or business unit.
Business different units within your organization can have different disaster recovery objectives and goals.
Here, we’ll just talk a bit about the benefits of using the cloud for disaster recovery. You’ve got the automated protection and replication of your virtual machines. You can monitor the health of your data center through AWS.
The recovery plans are pretty customizable. You can also do no-impact recovery plan testing. You can test without having to worry about messing up any of your production data. You can orchestrate to recovery when needed. Finally, you can do replication to and recovery in AWS.
Here, we’ll talk a bit about managing your DR infrastructure. Right now, on the left, with your current DR data center, you have to manage the routers, the firewalls, the network, the operating system, your SAN or NAS or whatever you may be using, your backup, and your achieve tools.
The beauty of using AWS to manage your DR is you really only responsible for your Snapshot storage. AWS is handling the routers, the firewalls, your backup, your archiving, and your operating systems. AWS just takes all that off your hands, so you don’t have to worry about these things, and you can focus on more important tasks and projects.
To give you a sense of just how AWS is able to map your DR data center, what this slide is showing is your services and then AWS’s services on the far right.
For example with DNS, AWS has Route 53. For load balancers, AWS has Elastic Load Balancing. Your web servers can be EC2 or Auto scaling. Data centers are managed by availability zones. Finally, disaster recovery can be multi-region.
Because everything is on AWS, there are some enterprise security standards that are definitely met. AWS is always up to date with certifications whether it’s ISO, HIPAA, Sarbanes-Oxly or other compliant standards that are constantly getting updated.
There is also the physical security aspect of it as well. AWS data centers are very secure, they are nondescript facilities, the physical access is strictly controlled, and they are logging all the physical access to the data centers.
Even with the hardware and software network, they have systematic change management, updates are phased, they also do safe storage decommission, there is also automated monitoring and self-audit, and then finally, advanced network protection.
Before we move on to the demo, we do want to have you all do just a quick poll here. Give me just a second to load that up.
To get a better understanding of everyone here in the audience, we currently just want to know, are you managing your disaster recovery right now with a DR data center or are you doing it with AWS?
If you’re not sure, that’s okay. It’s one of those things that you can probably easily find out by talking to your IT admins. Let’s go ahead. It looks like about half of you have voted, so we’ll give that about another 10 seconds before we close it up.
I’m going to go ahead and close the poll now. Looking at the results here, it looks like about half of you here are using DR data centers. For those of you who are using the DR data centers, one thing that we would love to know is exactly why you joined today’s webinar.
For those of you who are on the DR data centers; if you’re interested in moving your DR to AWS or you’re just curious, go ahead and in the questions pane, just let us know. The more information we have, the better we can assist you in the future.
It looks like a couple of you are not sure about how you’re managing disaster recovery. Again, that’s perfectly fine. What we recommend you do is you reach out to your IT admins and get some feedback from them to understand “We have a DR data center,” or you might be on AWS.
It also looks like a few of you are on AWS. For those of you who are on AWS, we’re going to give you a couple of tips and tricks, during today’s demo, on how to further leverage AWS’s DR capabilities.
That being said, now, we’re ready to go ahead and demo how to build a standby environment on AWS. I’m going to ahead and to turn it over to one of my colleagues, Joey Wright, who is a solutions architect here at SoftNAS.
Joey, I’m going to give you screen control.
Joey Wright: Thanks, Taran. Hey everyone. Joey Wright, solutions architect. Give me one moment to share my screen. We’ve painted this picture of why AWS is a good fit for your DR data center.
I’d like to illustrate how SoftNAS fits in the process and how we make this transition to a cloud-based DR data center much more attainable, much more manageable, and much more available above and beyond what AWS natively offers within their system.
SoftNAS obviously at its core is a NAS product. What we’re going to do is we’re going to abstract all those modern storage offering that AWS has.
All those different flavors of EBS Storage, the S3 Object storage, we’re going to abstract all of that from our services. All of those applications and all of the users that need to consume those applications. They won’t know what’s going on in the background. They’ll simply be able to access and consume that data, regardless of where it comes from, via native file system protocols.
What this means as you are progressing from on-prem DR to cloud-based DR, you don’t necessarily have to modernize all of your applications to consume cloud natively.
We can get those applications that aren’t targeted for modernization, be it budget, be it timeframe, be it lack of interest. We can get them to the cloud. We can put the SLA for availability and uptime on the shoulders of AWS and SoftNAS and we can be certain that that application is going to run because its data sits and is accessible natively within the cloud.
To walk you through the soup-to-nuts overview at a high level, this process starts within our AWS console. We’re assuming that you’ve already got your infrastructure laid out and you’ve created your VPCs, your network, your security, etc.
If you haven’t, let us know. We have got some great partners that can assist with that effort. Once that’s created, we want to install Buurst SoftNAS. We want to turn this modern storage into something that’s shared and available to all of our different services.
To that end, we’ve built some EC2 instances. As it relates to SoftNAS, these instances can be built via a self-contained AMI that exist on our marketplace – it also exists within the community AMIs.
We have several different offerings to fit your needs. Within the marketplace, we have a 20 TB SoftNAS Standard offering. We have a 1 TB SoftNAS Express offering. We even have a Pay-As-You-Go SoftNAS Consumption offering.
What’s great about this is they are through the marketplace, which means they are built through your subscription. You don’t have to deal with SoftNAS, from a procurement perspective.
Another great thing about these offerings within the AWS Marketplace is that they all come with a free trial. That means you can evaluate either one of these offerings, based on your capacity needs, for 30-days without any SoftNAS subscription billing.
You’re still subject to any AWS infrastructure charges but; as far as SoftNAS is concerned, we give you 30-days for free. If you leave it up or like to use it beyond these 30 days, that’s when subscription starts.
We also have a community AMI offering. The difference between the community AMI and the marketplace offering is that the capacity is determined in 1 TB increments and a license-key to expose that capacity to your AIM is provided by SoftNAS.
This is really for those scenarios that don’t fit within the current licensing model of the marketplace so if you need beyond 20 TB. For example if you 100 TB or PB or whatever it might be, we will work with you to get the appropriate license key for that. And that can either be built directly for SoftNAS, or we can even go through AWS for that as well but it does require a touch with SoftNAS organization in order to leverage.
Once you’ve selected the appropriate image, we do go ahead and we select an instance size. We go ahead and build this system and let it create itself. It does take about nine minutes to go through that entire process of initialization.
Once it’s initialized, we’re going to browse to the IP address of that machine. I’m actually going to a public facing IP address because I am not directly connected to my AWS system at this point.
You’re going to login to a web-based HTML5 based storage center GUI. This is the SoftNAS storage center. I’m going to log into that really fast. This is the first experience you will see if you were to walk along this process with me, if you were to evaluate it on your own, etc.
When we finally log into this console, we’re going to be greeted with a “Getting Started” page. What this is going to do is it’s going to coach us through all the processes necessary to get the system up so we can start getting data in the AWS so we can start leveraging a lot of these failover processes.
We’ll do some general housekeeping. We’ll make sure it can talk to our network. We’ll update it. We’ll apply license keys, etc. The point is we need to get capacity first because we’ve got applications that need to talk via CIFS or via SMB to a bunch of storage.
We have got Linux machines that need to talk via NFS to do whatever it is they do. Maybe we’ve got some old legacy applications that we need to turn S3 into our own personal SAN. This is where we’re going to do this.
We’re going to start this process after the maintenance is complete by adding storage or provisioning storage to SoftNAS. When you log into this disk/devices applet for the first time, what you’re going to see are any disks that are already associated with your instance.
If you happen to pick an EC2 compute instance that has ephemeral drives, you actually see those ephemeral drives here. If you’re on-premises and you’re doing this in VMware, for example, you might see any disk you have already provisioned to that virtual machine here as well.
The point is, this is where we take the modern cloud storage offerings — the EBS and the S3 offerings within Amazon — and we turn it this virtual disk that a file system can understand – that the CIFS protocol, the NFS protocol, the AFP protocol, and the iSCSI protocol can actually consume.
If we want to add S3, it’s as simple as choosing S3 as a cloud disk extender and we can very quickly assign a S3 bucket as a virtual disk to SoftNAS and these buckets can be in either Gigabytes or Terabytes. We can actually create 500 TB buckets if we want and have multiple buckets aggregate to a pool that will talk about in just a minute.
We can support over a Petabyte on a general size instance, within Amazon if we need to, depending upon the workload. Once you select that bucket, you have the option to turn on Amazon’s bucket level encryption if we need to. We can just turn that on.
We can go ahead that fast and provision 500 TB – or a half a Petabyte in this case — of S3 storage to SoftNAS. Such is the beauty of Amazon that we’re not consuming half a Petabyte of S3 at this point. We are actually taking up about 2 MB worth of metadata data inside that packet, but that’s about it. We pay as we go as it relates to S3.
If we need to add EBS disk offerings, we can also add them on the same device. So we can actually create a SoftNAS machine that has multiple different storage types based on capacity, based on performance — however, we need to do this — and we can manage all of this through one machine.
I could have general purpose SSDs. I can make some Provisioned IOPS, Throughput Optimized – whatever the offerings are at that particular time within your region inside of AWS.
If we want to create some Throughput Optimized hard drives, we simply select the hard drives, we select the capacity we want, and we create those disks. We can go ahead and just create one of those disks so you can see that in action.
The good thing about this is I don’t to go back to my AWS console to do this. We can actually assign these capabilities or these responsibilities for managing storage services to someone else within the organization and not actually give them access to the AWS console.
Once we have provisioned AWS storage as virtual disk on this device, we can now start creating these logical groups that represent capacity. We call these pools. We aggregate these disk devices to these pools.
Now, these pools can be created to reflect your business unit — maybe you’ve got a backup business unit; maybe you’ve got HR. However you need to logically define pools, you can.
You could create them based on project. Perhaps we’ve got a cloud migration project or a SAP HANA project or something like that. We can create a pool name that represents that particular project.
That benefit with that is if you have to charge back to some other scenario that you have to deal with. In addition to the app that you’re going through, you could very easily come through and say, “Okay, I have assigned 10 TB worth of capacity to this effort and now I know this is how much they have left available, this is how much they’re consuming, etc. We can get creative with how we manage these pools.
Once you have the pool name created, we’re also able to define a RAID level that exists on top of the RAID levels that AWS already has inherent to their system.
One thing we can do is if we’re using multiple disks — maybe we’re using several 100 Terabyte buckets to make up a Petabyte — we can stripe across those buckets using a software level RAID zero at the SoftNAS server and extend the performance characteristics of that.
To do the same thing with block devices, etc, obviously, you just need to have multiple devices if you’re selecting RAID.
In this particular case, I’m choosing no RAID because I don’t necessarily want to have multiple disks in this pool. One other thing I want to cover on this particular exercise is that we also have an additional option to turn on encryption.
We can turn on bock level encryption at the file pool level. Now you have the option to turn on to turn on encryption at the disk level. You have the option to turn on file level at Rest Encryption.
Now if someone was to hypothetically grab a disk at your DR data center and it happened to come from Iraq and either US has your data on it, you’ve got AWS encryption they’ve got to get through before they can get to your data. And once they get to your data, now you have at Rest Encryption they have to get through. We can satisfy a lot of different requirements with how we configure this.
I’ve already got a pool created and that’s simply because I want to show you some things that require historical information in just a minute. One of the frequent use-cases that we see are organizations using storage for home drives, so I have created pools specifically for that effort.
In your DR scenario if you’re using Windows Active Directory and your users have home drives and they’ve got those pictures of the animals that are on the background that are critical to getting their job done efficiently. It’s obviously important that when we failover, that this information is still available. We have a pool here for that.
The interesting about this particular pool that I built is that, if I looked at the details, I’m making this pool with S3. S3 has some performance limitations. That’s the way it’s architecture.
We can scale to monumental levels here but it is limited via Throughput, it is limited via IOPS, and that’s one of the reasons it is so expensive to leverage.
What we can do since we do have to read and we have to write to this system is we can actually leverage some of the additional features of SoftNAS to change the performance characteristics of the backend storage, and this isn’t specific to S3. I’m just using it as an example. We can do this to block level storage as well.
We can augment the way reads and writes occur. When we read from a system, we grab that object, that file, that data, or whatever it might be from this remote system. If this is a DR data center, the backend storage might be different than your production storage.
You might have a nice fast on-prem Isilon NetApp or whatever it might be, but you might actually have a Throughput Optimized magnetic or cold HDD or maybe even S3 in your DR for purposes of cost-savings.
We still need to be able to use this. The understanding in this scenario is performance is going to be different, but we need access to our data. For a DR scenario, chances are we’re going to be reading a lot of that data pretty heavily. And the way we read by default is we copy things and we put them in memory. So when somebody else needs to access that bit of data another time, it comes from RAM rather than that backend storage. It’s so great because RAM is really fast, but we run out of that.
We can actually take another EBS offering, maybe it’s general SSD or maybe it’s one of those local ephemeral disks that come with an instance – those guys that run 100,000 IOPS – and we can use that to buffer these reads. Now all the recently used data and the most frequently used data is coming from the super high-speed SSD drive rather than S3.
Now your users are none the wiser why is it they’re on a performance limited system. We’re not hammering S3 and having S3 come back and tell the system to leave us alone and putting up walls. We’re still allowing things to be performing.
We can also do the same thing with rights. SoftNAS treats your data and the durability of your data as paramount. It’s important that our data is integral, it’s correct, and it’s available.
When we write something, we by default write it to two places at the same time. We write it to what’s called a write log that sits in memory and rewrite it to that backend storage.
The minute we get that object written to memory on SoftNAS since it is a storage controller, we can tell your application to go ahead and send something else because we’ve taken care of what you’ve already sent us previously.
In reality, we don’t delete or remove that object from memory until that backend storage not only says they’ve written it to disk but we’ve checked and we’ve verified that it is correct and not corrupted in any way, shape, or form.
This is great because that means we’ve got two instances of your data at any given point. Until we verify that everything is correct, we’re not going to get rid of it in memory until it is correct.
The problem is memory is finite. Again, we have a certain amount of RAM associated with these EC2 instances. If by chance we have no more room in memory to accommodate additional writes that are coming to the machine, we’ve got IO congestion; we’ve got a problem.
In the case of something like S3 where we can only write so much to it before S3 literally sends us a message that says “Hey, back off,” we have a problem. We can either scale up our EC2 instances to give us whole bunch of RAM or we can augment that write log process with another high-speed SSD.
We could take some of those general SSDs or those preferred IOPS drives that EBS has available, we could assign them as a virtual disk, and now we can buffer that RAM with these high-speed offerings.
The good thing is, now, the write characteristics are at the speed of memory in this SSD, not the backend storage. So if we are going to experience congestion on the backend because we’re write limited or perhaps the block offering is not necessarily performing enough to keep up with it, the storage controller is going to eat that overhead. Your user or your servers, your application, actually leveraging the storage it doesn’t have to
This is very important especially when we’re considering DR, and we are potentially looking at scenarios whereby we are trying to save on cost and we are using different tiers or different performance levels of service from the backend.
Once we have the capacity defined — so we create these pools based on our projects, business units, customers, or whatever it might be — we need to give them access to it and we give them access by creating volumes and LUNs.
This is where we’re actually going to build that Windows CIFS, or SNB share, or that NFS mount, or the Apple File Protocol, or even the iSCSI LAN. This is where we’re either creating shares or we’re turning AWS Storage offerings into our personal SAN.
We do that very simply by going to another wizard that should look very similar to what you’ve seen this far. What we’re going to do here is we create volumes.
This is our root amount for NFS. This is our root share for SNB, etc. once you create that volume name, you then assign it to one of those storage pools you’ve created. I can see my storage pool home drives here. I can see how much free space is available. I’ll just go ahead and assign.
The next step is where we define which protocol we’re going to access this capacity. Are we going to use a file-level protocol, NFS, CIFS, or AFP; or are we going to use a block level protocol iSCSI. It’s as simple as checking the boxes or switching to a block level device.
As it relates to these file systems, we do allow you to offer to the same volume multiple protocols. We can allow to this NFS root share access via NFS in SNB — CIFS in this case.
Obviously we need to make sure security is going to permit how we configure this. The benefit is, now, if you have to failover Linux machines that write to a specific share — maybe they are dumping logs, or maybe they are dumping invoices, or whatever it might be — and you also have to failover Windows devices that need to get that information to process it through the ERP system web [Logix 0:34:14], or something along those lines.
Now you can house that data in the same volume, rather than having mechanism that have to copy it from one place to another, just so multiple systems can access and consume the same data. Yes, we can put both of them together if we need to.
Once we chose the file system, then we can chose how we actually provision the capacity. Are we going to allow this to dynamically grow based on the space available within that pool or are we going to put a quota against this? Are we going to thick provision?
As it relates to thick provisioning, this means we’re going out and we’re pre-allocating whatever you define here as space that no one else can use within that pool. This is important because thick provisioning equals utilization, as far as AWS billing is concerned.
If you provision block devices here, obviously, you as the customer pay for those EBS devices once you provision them; but with S3, you pay-as-you-consume. If you thick provision something, you consume it. If I do 10 GB here, you’re going to see your consumption go up by 10 GB.
The same thing applies with iSCSI. If I provision object storage as a block device to you and I thin provision that, it could be 100 TB. But since we have to format, more than likely, that block device so your computer can access it through a virtual machine or whatever it might be, we then consume that information. So that once you format that 100 TB, we consume 100 TB. Just be mindful of that, but we do you the option to manage towards either scenario.
Beyond that, we get to chose whether or not we enable in-line compression and deduplication. This is important because; since we are talking about cloud, the amount of bits that go across the wire and the amount of bits that sit at-rest equal money.
If we can reduce those bits that are travelling and sitting somewhere, we can reduce your cost, so compression is a good thing even if it’s not very effective against your particular dataset. Obviously, the better the compression, the more money we will save; but the note is, here, to always compress especially when you are in the cloud.
Deduplication is really depended upon your type of data. If you have structured data, if we’re migrating your backups from Veeam to the cloud as a component of this DR exercise and Veem is not handling dedupe, we can turn dedupe on and we can save you a lot of space in that effort.
You can even turn it on in addition to what you’re already doing, but you really need to make sure that your EC2 compute instance has enough memory to handle those tables that are operating, etc.
But once you’ve configured your volume, since we’re talking the continuity of data here — you’ve just experienced a DR event if you’re using this — we want to configure how we give you access to these objects within this volume, should you need them, from a prior point in time.
To paint a picture for you, SoftNAS provides to you that file system access to that volume and the file system we use on the backend is what’s called a copy-on-write system.
That means if you were to write an Excel file to SoftNAS, the first time you send that Excel file, we save it as a persistent object. The next time you modify that Excel file, we save those modifications separate to the first object, and this just keeps building on and on as you change this file — this Excel object.
If you open that Excel object, what you’re seeing are all those different pieces of that file. Look down from the top. You’re seeing the current version. What we can do that’s interesting is we can place these snapshots or these bookmarks or whatever you want to call them in between all these different points of modification.
This is where we define where we’re putting those snapshots or those bookmarks. By default, we do this every three hours between 6:00 AM and 6:00 PM, Monday through Friday. You can build a schedule that does it every hour on the hour, or every day of the week if you want, or any combination thereof.
When then go ahead and we say, of all those snapshots we’re now taking, how many do I actually want to retain and how long to I want to retain them? Once we define that, we’ve basically set that bar that says, “Now I’ve failover to this new data center; now I have seen my data how far back in time do I want to be able to give our users or services access to different versions of this data?”
That means once we define the snapshot cycle like, “This has been running since December and we were maintaining three or four months worth of snapshots,” we could pick a snapshot from January and instantly give our user read/write access to that file as it existed in January.
The reason we can do it instantly is because the data is already pre-existing. It’s not a redundant backup copy. It’s already there. I’ll explain more about that in just a second.
The last tab is for when we choose iSCSI. It allows us to pool an iSCSI target so we LUN target so we can actually connect to and consume that block level device.
I’m not going to create that right now because I already have one in place. I have this user volume called users that sits on my home drives folders. This is will be where the actual users sit – John, Bill, Mary, or whoever it might be.
The reason I have this pre-existing is because I wanted you to see what the snapshotting process looks like. If I click on this snapshotting tab, you can see I’m maintaining all the way back to December 1st.
If I need access to January 9th 2017, from 1200 GMT, I simply select that snapshot from the list and I hit “Create Snapclone.” What we’re doing here is we’re creating a brand new volume exactly like the original volume.
Users in this case were changing the name to reflect that it’s a clone from a certain date in time. Now your users or services or whomever can simply connect to, via NFS or CIFS in this case, that volume and they can see that data from that date.
If they want, they can modify that data and it will be saved in this brand new volume. We can even create a branch snapshotting routine for this so we could branch our data and keep it proper, today, if we need to.
Or if we need to override production, we simply copy information out of that directory, paste it into the users’ volume and now we’ve affected that change in production. Since we’re copy-on-write, we’ve got a snapshot that allows us to go back from that if we need to.
We can get very creative. From a durability perspective, we can maintain a lot of durability with regards to our files system. Somebody accidentally delete something, a virus corrupts something, whatever the scenario; we can recover from that.
We are talking DR here. What if we want to have a redundant copy of this? Maybe we are considering closing that on-prem data center and we’re looking for redundancy for the cloud.
We do give you the ability to take all this information that’s now stored on EBS or S3 and managed by SoftNAS and replicate it to a second device. I actually got a second device configured in a different availability zone of the same region.
I’m going to login to that device right now. I’ll let that login while I go back here. What I want to illustrate is how easy it is to start replicating data to an entirely separate data center and this can be on-prem to the cloud, it can be within the cloud (region to region), it could be cloud to cloud.
Replication is just a process of tunneling through one port and we’ll handle all the transfer back-and-forth at the block level using delta-based replication every 60 seconds.
You can see I’ve got no replication defined right now. I need to set up that second machine to handle the acceptance of all this data. The way we do that is we decide which storage pools we want to migrate.
In this case, I want to migrate home-drives. It’s 50 gigs of storage. I need to go to this second machine. I need to make sure it has a disk capable of consuming that 50 gigs so I need 50 gigs or higher of virtual disk.
Since this is a replicate partner, the best case scenario is it’s the same exact disk type. That way, if we ever have to use it, you’re going to get the same performance.
Again, this is a DR scenario and the technology permits use to use a different disk type. I could use S3 here even though I might be using general SSD on the other side.
Obviously, your workload needs to be compatible with the performance characteristics of what I’m choosing. But the point is, here, we’re not limited to using the same type of storage. A general SSD on the front side and maybe magnetic on the backside, however we chose to do it, we can get creative.
This instance is just like my first. I have that ephemeral storage. We don’t want to use that. Ephemeral storage is not persistent. If we move to this machine, we will not get a different machine that’s not the same device and we’d lose all our data. Great for read caching, not great for storing persistent objects so I need to add some dip storage here.
I’ll go through and I’ll add a 50 gig S3 bucket. Let me remove one of these zeros and hit “Create” for that. Now I’ve got the capacity I need to replicate home drives. Now I need to configure the pool for home drives.
This is how we control what replicates. You might have a dozen pools but not all of them warrant the SLA or the overhead of replication. What we will do is we will allow you to define which pools you want to replicate simply by recreating that pool.
I need to recreate home drives. I need to give it 50 gigs of space, and I need to create that. I need to choose the correct RAID options. Once I do that, we’ll create.
It’s going to warn me. I’m going to erase all this information that’s already existing, etc., and then it’s going to build. That’s all I need to do. I don’t need to worry about the volumes that are on that, etc. all of that metadata will come across. All of that will get recreated for us.
Once that’s created, I’ll actually show you that my volumes and my LUNs, they are empty. I haven’t pre-staged any of this. This is live so hopefully everything goes well and there is no hiccups.
Once these volumes and LUNs application loads, you’ll see there is nothing there that will automatically come over.
Subsequent to this tab, what I’m going to do is I’m going to open up the “Start to Replicate Tab.” You can see how the SoftNAS Storage Center GUI actually illustrates or shows you replication occurring.
You’ll see the same graphic we saw on the other system in just a second. It basically says nothing’s to find. We’re not really doing anything. I’m going to go back over to our primary machine.
This is the machine that’s hypothetically already up. This is going to be the primary file services in the system and we want to replicate it. I go over to my Snap replication tab and I go through a very simple process.
I am assuming the person sitting in front this keyboard is not a storage expert, he is a code or Linux expert, and he might not be a networking expert. I can assume he understands the new answers associated with replication or the new answers associated with configuring things with AWS to allow it to occur.
I can assume he is somewhat familiar with IT and he knows the IP address of the second system, 10.0.2.197. I can assume he can type better than I can. Then I’m also going to assume he knows the credentials of the user that has the ability to login to that SoftNAS Storage Center.
Once I do that, we’ll go ahead and set up replication. What’s going to happen right now is we will mirror all the data from those pools that we’ve selected. We’ll send that over to that secondary machine – that target.
Once we complete that mirror, what’s going to happen is, every 60 seconds thereafter we’ll ship a delta. We’ll ship just the blocks that have changed in those 60 seconds. If you don’t have a high data change rate, it’s not going to be much to change. It’s going to be very fast to send it.
Obviously if you have a higher data change rate, then it takes a little longer to change it. Once that’s there, we’re maintaining a 60-second recovery point objective between these two machines. That means, worst case scenario, we lose the primary, we’ve got a copy that’s maintained within 60 seconds sitting in another availability zone in another data center wherever it might be.
That’s great because it’s not only a redundant set of data; it’s actually a redundant SoftNAS system. Which means if I refresh these volumes and LUNs, I have the same exact volumes already created and available via NFS, CIFS, so on and so forth.
The only thing I need to do is I need to redirect all my users to this DNS, this new IP address, whatever it might be. You work your networking magic however long it takes to propagate across your network and you’re back up and running.
The manual process for failover could be pretty efficient. If your RTO allows you to take that time to manually bring it up, then this is a great solution. But if your RTO dictates that, “Hey, I need high availability,” we can add that here as well.
High availability is a little different from a network requirement perspective in snap replicate. The two machines have to be within the same network for HA. In the case of AWS, they sit within the same VPC. They are simply separated by availability zone. They are sitting in two different physical locations but they are within the same Virtual Private Cloud within AWS.
I’m sitting in US, West, right now, Northern California, and my machines are separated by availability zones 2A and 2B. In order to add SnapHA, all I need to do is to enter a notification email. I’ll enter my SoftNAS address and then we are going to be off to the races.
The first thing I’m going to do here is I’m going to choose whether or not I want this SoftNAS machine to be available through the public interface — meaning over the internet — or only within my AWS networking team.
I’m going to choose a virtual IP. I want this thing contained privately. I don’t need to expose it to the internet. Once I choose that, then the only rule we have is that we need to choose a virtual IP that all of our users are going to use to consume this storage that’s outside of what’s called a CIDR block – outside the network of what you have.
The net is here and it can’t be the 10. IP address that I have for all the other machines. Since it’s internal only, I can choose any address that’s outside that networking scheme. We’ll take care of all the heavy lifting to make this function, i.e. the modification of route tables and the infrastructure within AWS to make it happen for you.
I simply enter that virtual IP address and then SoftNAS does everything else behind the scenes to make HA actually function. We’re doing a lot of complex exercises behind the scenes. We are creating buckets as witnesses.
We are installing services. We’re making sure these two machines have the appropriate communication going back and forth so we can provide HA service. There is a lot going on the background that you don’t have to worry about, which means you don’t have to manage it.
You don’t have to go through lengthy documentation of this is how failovers actually occur, etc. You simply light up HA, turn it on via this exercise, and you’re off to the races.
Once this service is installed and running, the graphics that you see up here that right now say “Current status, source node primary” will change to let you know that you are now in an HA relationship.
You’ll be able to at a glance tell between the source and the target who is actually the primary node. You’ll be able to see that virtual IP address so we know at a glance this is the IP address that we assigned to that DNS name in networking and that means how all of our users consume storage.
SoftNAS will take care of what server is providing service to that DNS name or that IP address. But the point is we don’t have to.
It just takes a few to install, once it installs, again, you will see the graphics change. But essentially what’s going on now is, behind the scenes, we’re checking the health of this machine.
We’re not just pinging this machine and hoping everything is okay. We are actually going in and we’re checking the health of those virtual disks. We are checking the health of those pools and the volumes.
Obviously, we are looking at things to actually be down in the event of a ping failure. Once we fail, though, what we’ll do is we’ll automatically update route tables in AWS, so this target node becomes the primary provision of file services.
We’ll break the replication so there’s no split-brain, and within some 30 seconds, your users are now consuming data within a 60-second RPO. That’s us at a very high-level. There’s a lot of features.
I know I talked about a lot, but you can definitely follow on to this as you see fit. I’m going to hand this back over to Taran right now.
Taran: Fantastic. Thanks for that detailed overview, Joey. Let me share my screen here. Joey just talked a lot about what SoftNAS is capable of doing on AWS. Now, we want to talk through a bit of some DR architectures and scenarios for disaster recovery on AWS.
Now we’re going to talk about four main DR architecture – these are your backup and restore scenario when your data center goes down so you have to pull backup from AWS.
We’re going to be talking about Pilot Light, your Hot Standby, and then finally your Multi-Site DR architecture. To give you a sense of what AWS services are involved with these architectures and scenarios.
For the backup and restore, you’re not using too many of AWS’s core services. You’re going to be using S3 Glacier and SoftNAS specifically for the replication that Joey talked about.
You’ll also be using their Route53 service and their VPN service. Then as you move on to the Pilot Light, you’re going to add in Cloud Formation EC2, maybe a couple of EBS volumes along with their VPCs and Direct Connect.
Once you move over to Hot Standby, you add in the Auto scaling and the Elastic Load Balancing or ELB for short and then you set up multiple Direct Connects.
Finally for the Multi-Site, you’ll be using a whole host of AWS services for that. Talking a bit the backup and restore architecture. The way it works here is, on the screen, we’ve got your on-premises data center here on the left. Then on the right, we’ve got the AWS infrastructure as well.
Over here on the left, you’ve got a data center, you’ve got physical San using iSCSI file storage protocol, and then you’ve got a virtual appliance on top of that managing your storage.
What we are saying that you can do with AWS’s DR is you can use a combination of SoftNAS, S3, EC2, or EBS to basically go and manage your backup architecture.
In the traditional environment, data is backed up to a tape. You have to send it off quite regularly. If something fails it’s going to take a long time to restore your system because you have to pull those and pull the data from them. That’s what makes Amazon’s S3 Glacier really good for this.
You can transfer your data to S3, back it up one cent per gigabyte per month. Using a service like SoftNAS enables you to use snapshots of your on-premises data and copy them into S3 for your backup.
The beauty of this is you can also snapshot those data volumes to give you the highly-durable backup. Again, backing up is only half the equation here. The other half is actually doing the backing up.
We move on to the next slide here. The way the backup and restore works with AWS for your DR is it’s simple to get started. It is pretty cost-effective. You’re not paying a lot of money for it.
Then in case of disaster, what happens is you’re going to retrieve your backups from S3. You bring up your required infrastructure – these are the EC2 instances with prepared AMIs, Load Balancing, etc.
You restore the system from a backup. Basically, the objectives here for RTO is it’s as long as it takes to bring up your infrastructure and restore it from backups. Your RPO is the time since your last backup.
With the backup and restore architecture, it’s a little bit more time consuming and it’s not instant, but there is a workaround for that. The reason we keep bringing up SoftNAS is because you can set up that replication with SnapReplicate so your data is instantly available.
Instead of you having to wait for it to download and back it up, it’s now all instantly available to you so your RTO and your RPO go from hours or days into minutes or just one or two hours.
Moving on to the Pilot Light architecture, this is basically a scenario in which a minimal version of the environment is always running in the cloud. The idea of this is you can think of it as a gas heater.
On a gas heater, a small flame is always on that can quickly ignite the entire furnace to heat up your home. It’s probably the analogy I can come up with. It’s pretty similar to a backup and restore scenario.
For example, what happens with AWS is you can maintain that Pilot Light by configuring and running your most critical core elements of your system in AWS; so when the time comes for recovery, you can rapidly provision a full-scale production environment around that critical core.
Again, it’s very cost-effective. In order to prepare for the Pilot Light phase, you replicate all of your critical data to AWS. You prepare all of your required resources for your automatic starts – that’s the AMI, the network settings, load balancing. Then we even recommend reserving a few instances as well.
What happens in case of disaster is you automatically bring up those resources around the replicated core data set. You can scale the system as needed to handle your current production traffic.
Again, the beauty of AWS is you can scale higher or lower based on your current need. You also need to switch over to the new system. Point your DNS records to go from your on-premises data center to AWS.
Moving on to the Hot Standby architecture, this is a DR scenario which is a scaled down version of a fully-functional environment that’s always running in the cloud. A warm standby extends the Pilot Light elements.
It further reduces the recovery time because some of your services are always running in AWS – they are not idle and there is no downtime with them. By identifying your business-critical systems, you can fully duplicate them on AWS and have them always on.
Some advantages of it is it does handle production workloads pretty well. In order to prepare for it, you replicate all of your critical data to AWS. You prepare all your required resources and your reserved instances.
In case of disaster, what’s going to happen is you automatically bring up the resources around the replicated core data set. You scale the system as the need be to handle your current production traffic.
The objectives of this is Hot Standby is meant to get you up and running almost instantly. So your RTO can be about 15 minutes, and your RPO can vary from one to four hours. It’s meant to get you up and running for your tier 2 applications or workloads.
Finally we have the Multi-Site architecture. This is basically where your AWS DR infrastructure is running alongside your existing on-site infrastructure. So instead of it being active and inactive, it’s going to be an active/active configuration.
The way this works is this is the most instantaneous architecture for your DR needs. The way it works is, at any moment, once your on-premises data center goes down, AWS will go ahead and pick up the workload almost immediately. You’ll be running your full production load without any kind of decrease in performance.
Immediately it failover, all your production load, all you have to do is adjust your DNS records to point to AWS. Basically, your RTO and your RPO are within minutes so no need to worry about time re-architecting everything. You are up and running again within minutes.
To give you an example of how our customers are using disaster recovery on AWS. One of our customers right now is using AWS to manage their business applications and they’ve broken them down into tier 1, tier 2, and tier 3 apps.
For the tier 1 apps that need to be up and running 24/7, 365 days a year, what they are doing is they’ve got their EC2 instances for all services running at all times.
Their in-house and their AWS infrastructure are load balanced and configured for auto failover and they do the initial data synchronization using in-house backup software or FTP. And finally, they set up replication with SoftNAS.
So in case a disaster happens, they automatically go ahead and failover in minutes and they lose any productivity data or anything like that. With the tier 2 apps, what they’re doing is they go ahead and configure the critical core elements of the system. They don’t configure everything.
Again, they’ve got their EC2 instances running only for the critical services, so not all services. They go ahead and they’ve preconfigured their AMIs for the tier 2 apps that can be quickly provisioned. Their cloud infrastructure is load balanced and configured for automatic failover. Again, they did the initial data sync with the backup software and they did replication with SoftNAS.
Finally for the tier 3 apps where the RPO and the RTO isn’t too strict, they’ve basically replicated all their data into S3 using SoftNAS. They did, again, the data sync with the backup software. They went ahead and preconfigured their AMIs. Then their EC2 instances are spun up from objects within S3 buckets. It’s a manual process but they are able to get there quickly.
Again, using SoftNAS’s Snapreplicate feature, their backup and restore is a lot quicker than it would normally be just using AWS by itself.
Here, we’ll talk a bit about our highly-available architecture. I know we’re running past our scheduled here. Joey, if you can cover this in about two minutes, I think we should be good to go.
Joey: Certainly. We definitely talk about this in our demo. One of the great things about our ATA solution is if you architect it per our standards, we will offer you a five nines uptime guarantee, so there’s an additional SLA there on top of the SLA that AWS already provides.
We’ll go ahead and forward. This is a very high-level architecture, but again, the point is, here, we’re going to give you the ability to have cross-zone HA available either via an elastic IP which this illustrates or on the very next slide a virtual IP.
The notation is there. We can keep everything private. Or if you need to scale out and offer storage services to services that exist outside of your VPC, we do give you the ability to leverage that.
All the replication between these two machines, again, is block level replication and it is delta-based. And we do give you the ability to effectively have some 30 seconds failover between two machines that have data independent of one another separated by availability zone.
I hand it back to Taran. If you have any additional questions and you want to dive deeper into HA, definitely let me know and we can reach out and schedule a conversation.
Taran: Thanks for that, Joey. We’ll go ahead and move on to our next poll question here. To get a sense of which DR architectures you all intend to build…
I’m sorry. I clicked on the wrong link there. There we go. We’ll go ahead and launch this poll. Of the four DR architectures that we just talked about, which ones do you intend to use with AWS? Are you going to do the Backup and Restore, the Pilot Light, the Hot Standby, or the Multi-Site DR architecture?
We will go ahead and give that probably about 10 more seconds. It looks like nearly half of you have voted. Is that okay. We’re going to close the poll now.
Let’s go ahead and share the results. It looks like most of you are not sure of which DR architecture you want to use, and that’s perfectly fine. DR can be complicated for your potential use cases.
For those of you who aren’t sure, we recommend that you reach out to us. Go to softnas.com/contact or email firstname.lastname@example.org and we’ll go ahead and reserve some time to talk to you about how you are using disaster recovery and how we can help you best pick the use case that you can be using it for.
Then it looks like a lot of you are interested in the Pilot Light architecture, which is great. Pilot Light get’s you up and running quickly at definitely a much more reduced cost when having a DR data center.
Moving on, we covered SoftNAS quickly here. What we want to cover also is to give you guys an idea of our technology ecosystem. We do partner with a lot of well-known technology partners. AWS is one of our partners.
You are able to go in and download SoftNAS, on AWS, free for 30 days to try it out. Then also, we do partner with companies like Talon, SwiftStack, 2ndWatch, NetApp, and a couple of other companies as well.
To give you a sense of the companies that are using Talon’s. Large well-known brands are using SoftNAS to manage their storage in the cloud. You’ve got Nike, Boeing, Coca Cola, Palantir, Symantec. All these names on the screen are managing hundreds of terabytes of data on AWS using SoftNAS.
In order to thank everyone for joining today’s webinar, we are offering $100 AWS credits that I’m going to go ahead and post in the chat window here on GoToWebinar.
If you click on that link, it will let you go in and register for your free $100 AWS credit. All we need is your email address. That’s the only information that we need from you. Once you put in your email address, you’ll receive a code for a free $100 AWS credit from one of our colleagues.
Finally, before we get to the Q&A here, we do want to let you know about a couple of more things. For those of you who are curious about how SoftNAS works on AWS or you’re just interested in learning more, go and click on this first link here. It will take you to our website and you’ll be able to learn more about how SoftNAS works on AWS.
You’ll learn about the use cases, some technical specifications, and you can also download a couple of whitepapers as well.
We do also offer a free 30-day trial SoftNAS on AWS. For those of you who liked what you saw or you’re curious about a couple of things, just go ahead and click on that “Try Now” button and you’ll be able to go in and start using SoftNAS and get up and running in less than 30 minutes.
We know we’ve covered a lot of content here today. For those of you who have any questions or you want things explained further, just go ahead and contact us. Our solutions architects like Joey are happy to sit down, talk with you, and answer any questions that you might have about disaster recovery or anything else on AWS.
Finally, if you are using SoftNAS and you have a couple of questions or you need some help, just go ahead and reach out to our support team and they’ll go ahead and answer any questions that you may have and they are also readily available to help you out.
With that, let’s go ahead and get on to the questions here. It looks like we have a lot of questions coming in today so we’ll go ahead and answer just a few of them here today.
The first question we have here is, “How do you recommend moving tier 1 applications like SAP to AWS?”
Joey: What we’re going to do is we’re going to look at the performance needs or requirements of these particular applications. How many connections are they making? How many in-flight files do they have in any given point in time? What’s the average file size?
We need to look at IOPS, Throughput, etc. The whole point of this exercise is to create a storage controller that can accommodate or exceed those expectations from IOPS, throughput, latency, etc.
The one caveat thing, we are a network attached storage so we are always subject to the network has usually been the slowest link within the system. Provided the networking is not the issue, it’s just a matter of architecting a system that can meet your data needs for both capacity and performance.
Taran: Thanks so much, Joey. The next question that we have here is, “What is AWS running in the backend as a hypervisor?”
Joey: They are running the Zen Hypervisor.
Taran: The next question that we have here is, “Is there any whitepaper that discusses the performance of SoftNAS on AWS Specifically? I’m looking for a reference architecture.”
Joey: I definitely have a reference architecture. I don’t believe it is published as of yet, but it does cover SoftNAS and all of its components within a multi-AZ infrastructure with HA so you can see how everything is configured and running within that architecture. I can definitely provide that to you. Reach out to email@example.com if you’d like or even my personal email address and we’ll get back to you.
As far as the performance numbers are concerned, we do have some general very high level recommendations available on our website. More granularity is coming very shortly to that list. Beyond that, we have various difference sizes and instances for different matrix and that’s something we can share with you if you’d like to continue this conversation.
We also have some recommendations for how we size instances based on your needs so anything that would deviate from the prescribed guidance that we have out there now, then it’s going to be published very soon.
Taran: Thanks for that, Joey. It looks like we don’t have any more questions. What we also want to do recommend is if you go to softnas.com/aws, you’re able to go and access some of our resources that provided more technical information of how SoftNAS works on AWS.
Before we end this webinar, we do want to thank everyone for attending. We also want to let you know that there is a survey at the end of this webinar. Please go ahead and fill that out. It only takes about two minutes.
The main reason being is once we get your feedback, that gives us a better sense of how to prepare for our future webinars. If you’re happy with today’s webinar, great. If you’re unhappy with today’s webinar, just let us know. That will give us a better sense of what we need to do better in the future.
With that, we want to thank everyone again for joining today’s webinar. We look forward to see you to our future webinars that we’ll be doing throughout the rest of the year. Thanks again everyone and have a great day.