Islands in the Stream
Mary Roberts
January 13, 2021

Data is everywhere, flowing through our networks like the rivers and streams in our physical world. And like the physical world, it accumulates wherever the conditions allow. With water, this results first in ponds, then lakes and eventually oceans. In the data world, the same can happen, vast reservoirs of data that could easily become as hard to plumb as the ocean itself. These pools of data can occur anywhere, but particularly in locations where the means to gather it are not naturally present, or there are natural barriers preventing its flow. Information technology terms this the edge, and the information that resides outside the traditional datacenter, in these difficult to access reservoirs, edge data.

Buurst’s goal, whether with our storage management solution, SoftNAS, or our new edge data consolidation and orchestration product, Fuusion, attempts to bridge the divide between useful, usable data, and the current potentially inaccessible oceans of data your organization might currently generate.

Because of the difficulties in transporting multiple large and small streams of data and orchestrating these flows in real time, a more agile infrastructure ideal is being proposed, one where processing occurs increasingly at these remote locations. Gartner tells us that the infrastructure of the future will be “anywhere the business needs it”, and that by 2022, 60% of enterprise IT infrastructures will focus on centers of data, rather than traditional data centers.

Increasingly, we’re seeing the advantages of processing at the edge locally. You can reduce your cloud cost and cloud spend quite a bit, and give yourself much faster turnaround times at the edge. Or in some cases, it may be the only thing feasible because you need maybe millisecond or even sub-millisecond responsiveness at the edge.

Rick Braddy, CTO, Buurst

This does not mean that centralized data management will go away entirely – as Adam Burden, Accenture’s technology lead for North America, puts it “The biggest issue of creating ‘centres of data’ is the underlying architecture and technology ‘ballet’ needed to ensure there is a consistent version of the truth – no matter the data lake, the system or data stream being interrogated.” This means that regardless of how many loci of data gathering, you have to ensure that the data remains the same, without duplication or differing results due to different gathering methods or criteria in reporting. In an article by ITPro, duplicate efforts is identified as one of the primary pitfalls facing the industry. “Many organisations are paying for resources and tools across multiple centres – HR might be building processes and storing data one place, while legal and finance are each doing the same elsewhere.”

For this reason, a solution that is centrally managed, yet can easily extend across multiple locations and scale accordingly is the ideal solution. Essentially, we want to create islands in the data stream. On these islands, depending on size and scope, we can create the foundations of bridges that will allow unfettered access to verifiable source data, or create dams to control or redirect the flow, as well as parse and filter out the desired data. This actionable data could be considered analogous with hydro-electric energy, empowering the datacenter.

Buurst’s Fuusion consists of two parts – a Fuusion Controller solution which orchestrates the dataflows from the edge and handles the centralized management. At the edge, often in containers such as Kubernetes, or in small virtual machines hosted on-site, is where you will find Fuusion Edge, gathering, parsing, and delivering pre-defined data flows to where it is needed. To ensure that the data gathered is accurate and unchanged, Fuusion provides clearly defined tracking and provenance information at every step of it’s journey, at every processor it touches.

Connect Off-Cloud Data to On-Cloud Services

One of the key challenges in gathering data of this nature – data generated at the edge – are network connectivity issues, increased latency, or schedule-reliant connectivity (such as satellite uplinks) and other network difficulties. Whether it’s an oil rig at the far reaches of the prairies or a ship reporting locational data only when satellite connectivity is available, the organization in question will have to make the most of the windows of connectivity available. Fuusion handles this with it’s patented UltraFast feature, leveraging the full available bandwidth by pushing the flow of data across UDP instead of latency inhibited TCP protocols.

Another key challenge to overcome are the numerous formats that data generated at the edge can take, word documents, excel spreadsheets, SQL and NoSQL data, JSON, Salesforce – or any combination of these, and more. Any solution provided must be flexible enough to handle the data generated, and either parse it into a usable format (process it at the edge) or transfer it to a location where it can be processed in a clean, unchanged format. On the processing side, Fuusion’s Apache NIFI based processors are able to natively handle multiple common file formats out-of-the-box, leveraging NIFI’s longstanding open-source efforts. But in addition to these pre-configured processors, Fuusion offers custom processor capabilities, allowing our professional services to create a solution where there was none before.

For those dataflows where the data needs to be kept intact for future processing, Fuusion offers clear provenance, tracking the flow of data from the beginning of its journey to the end, no matter how many stops along the way. Fuusion also ensures that if a flow is stopped, it will resume the moment connectivity is re-established, right where it left off, by comparing data to the last processor touched. Each processor can rely on the previous.

Finally, we have another key problem at the edge – support infrastructure. In the remote locations where edge data is generated, access to server infrastructure is limited, if it exists at all. Fuusion flexibility is not just about the data formats we can handle, it is also about deployment and scalability. As Rick Braddy told us in a recent webinar, “We can deploy on physical machinery, VMs or containers. We can live within a Kubernetes cluster if you are already running Kubernetes out at the edge. Or even on different cottage type of devices, like a Snowball edge, or Azure Stack. And then also we can of course run on hyper-converged, which is still just a virtualized infrastructure, and all of this (can be) centrally managed”.

Buurst’s Fuusion is uniquely equipped to handle the rivers and streams of data meandering across your landscape. Rather than let them become oceans to sift through, we can help ensure that islands of actionable, real-time data solutions are within your organization’s reach. Contact Buurst Professional Services to learn how.

Subscribe to Buurst Monthly Newsletter 

More from Buurst

Try SoftNAS at No Cost