Verifying Snapshots on SoftNAS for Compliance for Halliburton
Kai Poynting
March 15, 2021

As we grow and evolve our Fuusion product, we are constantly finding new use cases, and learning to better respond to customer requirements. Halliburton is one of Buurst’s earliest customers, using our SoftNAS product long before the advent of Fuusion. So, when Halliburton came to us looking for a solution related to our SoftNAS product, we rose to the challenge. And because our solution to Halliburton’s problem involved both of our products, it seemed the perfect opportunity to illustrate at a high level how Fuusion was used to meet their needs.

Halliburton operates over 200 SoftNAS servers across their infrastructure, leveraging both AWS and Azure cloud storage. For compliance purposes, Halliburton needed to ensure and document that for each SoftNAS instance or VM, snapshots were being performed as scheduled. This needed to be performed in an automated fashion, with minimal manual effort across all the 200 virtual machines. This solution was of double interest to us because it allowed us to not only prove the flexibility of our Fuusion product, but to verify that SoftNAS’ built-in snapshot solution operated as intended across a large deployment.

In order to show how our solution works, we created a POC on a smaller scale.

We hope this smaller scale deployment proves an ideal introduction to our Fuusion product in operation.

Identifying Hosts

The first step to verifying each of the 200 SoftNAS deployments operated as intended, is of course to identify the hosts. This proved an easy task, as all it required was compiling a CSV list of all IP addresses or host names. To verify our flow, we started with a small sample size of five IP addresses, two working IP addresses based on a sample environment, and 3 non-existent IP addresses to simulate failures. In Halliburton’s production environment this list would compile the IPs or hostnames of each SoftNAS deployment in their environment.

This CSV is fed into the first processor of the Fuusion flow. The processor, named Get_host_list, obtains files from each live host via the IP addresses or hostnames provided, and a python script running in the background. The processor script grabs snapshot details from each live instance.

Splitting the Records 

Next, the records need to be split on a per instance/VM basis, into successful connections and failures. This is done via a processor we called simply “SplitRecord”. This processor took the data from these instances, existent and non-existent, and created records with the same file name, but each with a different host (UUID) record.

Looking into one of the flowfiles and its attributes, we can see many different attributes that can be called upon in later steps, as necessary. For our purposes, the attribute we are most interested in is the fragment count. The fragment count attribute tells Fuusion that there are 5 different fragments to this single record. Knowing this, and the UUIDs, allows Fuusion to determine the fragments belonging to the record, and allows them to be re-assembled upon request. 

Execute String 

This step verifies the output of each SoftNAS connection, and the data requested from each by the python script in the processor mentioned earlier. As you can see, this record has successfully pulled the hostname, platform, volume name, snapshot count, the age of the last snapshot (LAST_AGE), whether snapshots are enabled, on what schedule, and how long they are retained, for each volume on the instance. The volumes are listed below, with values provided for each of the variables above. 

If the connection fails, on the other hand, this data is not available. So, for the three non-existent IP addresses (purposely created to show how to handle failed connections as you will recall), we instead see execution_status_0 in the Flowfile attributes, indicating a failed connection. This attribute allows us to sort the failures separately, as we will see shortly. 

Update Attribute Step

The Update Attribute processor’s job is simply to rename the files based on success or failure. Successful connections with the data pulled from valid servers are re-named AWS_SNAPSHOT_RESULTS.

The failed nodes (based on the execution_status_0 attribute mentioned earlier), are renamed AWS_ERROR_HOSTS. Remember, even though we have the same filename applied to each file in each category, they are still differentiated by separate UUIDs, and can still be recompiled based on the fragment count attribute.

“Notify” and “Wait”

In typical configurations, Fuusion flows are not configured to perform batch operations. But as you will see, with some creativity and ingenuity, Fuusion is flexible enough to manage just about anything. To manage this requirement, we needed to leverage some pre-existing controller services in a creative fashion, notably a distributed map cache, similar to services such as DynamoDB or Reddis – anything with a key value store. To put it simply, the key is something we need to count, and the value is that count. The Notify processor tells us about that count (the count being the successes or failures to be sorted). The signal identifier is a made up value simply called ‘release’. The signal counter is a key called ‘process’ record, and each process_record will increase the count by an increment of 1. Each increment is then stored in the Distributed Map Cache.

The coolest part of this is that there was no need to set up a separate service such as DynamoDB or Reddis. We were able to leverage the rich variety of controller services already present to create our own solution.

With the “Wait” processor, we are essentially telling the flow when to proceed further, ie, when to run the batch process, by listening for the “release” signal identifier. The Wait processor is going to find that fragment_count attribute mentioned earlier, and look for records from Notify until the counter reaches the value specified by the fragment_count, which we know to be five. Five fragments come into Notify, and five go out, split based on defined variables, all automated in the flow to this point.

Route on Attribute 

So, with Notify and Wait, we tell the flow when to proceed. We’ve split the fragments apart and have attributes labelling them on success and failure, and now we are ready to begin putting them back together. The first thing we need to do is to merge successes together, and merge failures together. We do this by sending the fragments in different directions within the flow, using the Route on Attribute processor. This is done quite simply by sending the fragments in either direction based on a value we’ve seen before, the execution_status. With a simple NIFI Expression Language command, the files are sorted and split to two basically identical processors called MergeRecord 

Fragments (Flowfiles) with an execution_status of 1 are sent to the right (successes). As you can see, 2 files have been sent to the MergeRecord processor, and as we know, 2 of the five IP addresses corresponded to live SoftNAS virtual machines. 

Those with an execution_status of 0 (or failed connections) are sent to the left. As you can see, there are 3 fragments sent to the Merge Record Processor, corresponding to the 3 invalid IP addresses.  

Each of these MergeRecord processors compile the fragments together, successes put together in one, and failures in the other. Finally, the PutFile processor creates a single file out of the merged records, preparing it to be sent out to a shared storage location, in this case an S3 repository.  

So, with this flow, we were able to suck raw data up from a given source (in this case SoftNAS), assign attributes to split and separate the desired data, and tell it where to go, then put it back together in the desired format, all before hitting a central repository.  

Once sent out to S3, it can be retrieved by another flow, to apply additional formatting if necessary, such as formatting it into a CSV or other file format ready for consumption like the one below 

This CSV output, fully automated and tailored to the needs of the client in a standardized format, or a considerably larger one containing all 200 of Halliburton’s servers and their current snapshot status can then be input directly into any business intelligence tool you specify.   

Remember that while this is a very simple example, the same principles can apply to data from any source, and can be split and recompiled in the same manner based on any attribute defined. That’s powerful stuff, if defining a data flow. There are any number of use cases that can benefit from just the basic principles illustrated here 

More Information

Get a Fuusion Demo to find out how we can automate your biggest dataflow challenges, or even just take the hassle out of some of your smaller ones. 

More from Buurst