Buurst the DataOps Bubble

Buurst the DataOps Bubble

Every company is looking for the next big thing when it comes to gathering, using, and/or manipulating data. Data-driven transformation is often considered the hill upon which businesses may die in this world of ever faster changes. The search can be for the next big application, the next technological innovation, or the next procedural paradigm shift. For the latter, many are heralding DataOps as the next big thing.

According to Gartner, DataOps “applies the traditional DevOps concepts of agility, continuous integration and deployment, and end-user feedback to data and analytics efforts.” However, it goes on to clarify that it isn’t simply a technical competency, but rather a “lever for organizational change, to steer behavior and enable agility.”

Regardless of how it is defined, DataOps has captured the attention of the marketplace to the point that many technologies talk about it, yet their scope is limited to the product they hope to sell. It is important to note that even while a product such as Fuusion might assist with DataOps, DataOps is a set of practices, not a technology, and no product currently offers DataOps in a box.

Many simplify DataOps as applying Devops principles to data workflows. The idea behind this simplification is that with data workflows becoming increasingly cloud-native, the similarities in the relationship between data engineers and operations and that of dev and ops have become more and more apparent. A data product must be monitored, patched, and updated, just like a software product.

DataOps is more than this, however. Similarities in relationships aside, it is more than simply bringing operations and data engineering together, or simply managing the flow of data, but “also about ensuring everyone involved knows why the data workload is running and what the desired business outcome is.” To be successful, it must be supported not just by the data engineers, their tools, and their processes, but also by connecting with those who will actually use the data, and those who can define the business needs behind the data generated.

This distinction is important to understand, especially if selling a tool to assist with DataOps. Claims that any tool delivers DataOps are hollow at best. The best one can offer is flexibility and the ability to deliver the data where needed. Defining these needs and implementing the workflows is the province of the client, who in turn needs to recognize that simply purchasing additional technologies, or speeding up the flow of data is not enough.

As Ted Friedman of Gartner puts it , “Rather than simply throwing data over the virtual wall, where it becomes someone else’s problem, the development of data pipelines and products becomes a collaborative exercise with a shared understanding of the value proposition.”

It is in the value proposition that you will find success for your organizational DataOps efforts. Without defining what you want your data to do for you, simply having more of it faster will not net you the results promised by they hype behind DataOps. So, before contacting a vendor like us, Buurst recommends a bit of due diligence. With Fuusion, as an example, we most certainly can aid you in defining your workflows. We can get you the data you need in formats that are inherently more usable, and ensure its accuracy via provenance tracking. What we cannot do is define your goals for you, or decide what your needs are.

So what do you need? Where will your data processes benefit most from a little added speed or automation?

Is it a Utility value proposition?

This means you wish to treat your data as a utility, removing silos, and reducing manual efforts when accessing and managing data. This approach makes data available to multiple relevant roles.

Is it a business enabler value proposition?

If so, then data and analytics are focused on specific use cases, such as analysis of supply chain optimization, or fraud detection. DataOps collaboration with business unit stakeholders is key in this scenario.

Or maybe it’s a data and analytics driver value proposition?

Do you need the data and analytics to help create new products or services? Break into a new market? Generate a new revenue stream?

Quite likely, it may be a combination of any or all of these. But identifying an initial focus will certainly improve your chances of success. Applying processes or tools without considering the endgame will ensure that the full benefits of DataOps processes cannot be realized. But if you know what you need, or want help in determining those needs, Buurst is ready and willing to assist.

For more information on Buurst, DataOps and Fuusion:

Cloud to Edge Data Distribution Over High-Latency and Satellite Networks

Cloud to Edge Data Distribution Over High-Latency and Satellite Networks

Background

Over the past 8 years, we have seen more than an exabyte of data migrated onto the leading public cloud platforms. This makes public cloud a new center of the business data universe for many enterprises and government agencies today.

Now we see the rapid rise of Edge Computing coming next, where up to 50% of all new data creation will take place over the next several years from the edge. Indeed, we see traditional on-premises storage, server and network vendors turning their attention to becoming the arms providers to fuel the growth of the edge and its IoT cousins, to restore growth and health to their IT infrastructure related businesses, which have suffered due to cloud computing migrations supplanting traditional datacenters.

As we introduced the Buurst Fuusion™ product into the market in 2020, at first, we focused on what came natural to us after helping customers migrate thousands of applications and petabytes of data from on-prem into the leading clouds – moving data into the cloud. What we discovered came as a bit of a surprise at first. Of course, there’s still massive data transfers and migrations to the cloud, but now we see the rise of the edge creating a gravitational pull on the data stored centrally in the clouds to fuel edge computing.

Edge Is Data Hungry

We have heard about edge data creation intensity, fueled largely by IoT sensor data feeding many edge computing systems. What we don’t hear as much about is the fact that these often standalone, headless edge nodes require care and feeding from a central command and control system, which is increasingly hosted in the cloud.

Edge systems require data for:

  • Software updates
  • Configuration settings
  • Inference engine updates
  • Code deployments and updates
  • Container deployments

Edge computing systems need to be installed, configured, updated and cared for like any other IT system. What are some examples of edge systems that need centralized control?

  • Offshore Rigs
  • Shipping Vessels
  • Military Systems
  • Electric Vehicles
  • Commercial Drones

And since many of these edge systems exist at remote locations, the networks connecting edge and cloud are often less than ideal. In fact, many remote edge systems must rely upon satellite, radio, DSL, and in the future, 5G networks for their connection with the rest of the world. These edge networks often bring high levels of latency, packet loss and sometimes intermittent connectivity, unlike the pristine, redundant network conditions we see in the cloud and traditional data centers.
To successfully deploy and maintain edge computing systems remotely over challenging high-latency, lossy network conditions between systems using incompatible data types, several solutions are needed:

  1. Store and forward with guaranteed delivery
  2. Optimizations to overcome TCP/IP issues with high-latency and packet loss
  3. Data format flexibility across disparate systems.

Let’s examine each of these data distribution challenges in more detail.

Store and Forward with Guaranteed Delivery

Particularly with satellite networks using mobile access points (e.g., ships at sea, military personnel and equipment), connectivity is intermittent and can be lost entirely for unpredictable periods of time. During network outages, it’s critical that edge systems can queue up data that needs to be transmitted so it isn’t lost. It’s equally important in many cases to ensure guaranteed delivery to maintain transactional integrity.

Optimizations to overcome TCP/IP issues with high-latency and packet loss

It’s common knowledge that TCP/IP falls over from a throughput perspective in the face of network latency and packet loss errors. Latency is commonly introduced by satellite and other radio (e.g., 4G/5G) communications systems. Packet loss exacerbates the effects of latency on TCP/IP by causing TCP’s window size to shrink. The results can be readily seen in the following chart.

It’s easy to see how TCP’s throughput becomes unusable with any appreciable amount of packet loss and latency, limiting the effectiveness of cloud-to-edge data distribution and edge-to-cloud data consolidation efforts in many real-world use cases.

Data Formats – Object Storage, File Storage and SQL/NoSQL from the Clouds

Data is stored in the cloud in many different formats. Object storage is the most common, such as S3 on AWS® and Azure Blobs on Microsoft® Azure cloud. Unstructured data continues to be managed as filesystems, and increasingly by NoSQL databases. Structured data can be found in various SQL databases, as usual.
Different edge devices operate upon and create data in their own proprietary formats, often access via REST API’s.

Some means of data extraction, combining fields of related data and then transforming the data format at the proper place in the edge/cloud continuum is required.

How Fuusion Addresses Data Distribution between Cloud and Edge

Guaranteed Delivery. Fuusion leverages Apache NiFi technology to manage data flows. Data gets queued between each data flow processing block, as well as when sent across the network. Even at very high scale, guaranteed delivery.

UltraFast® Data Acceleration. Fuusion includes a key feature that optimizes end-to-end data transfer over latency/packet lossy networks. It does this by intercepting TCP traffic via a SOCKS proxy, redirecting it through a proprietary UDP channel that ensures reliable delivery and that overcomes the effects of latency and packet loss using patented technology. To contrast the data throughput in the face of latency and packet loss, consider this performance chart that reflects UltraFast throughput.

As we can see from the above chart, throughput remains reasonably constant, at 90% or better, regardless of the network link’s latency and packet loss characteristics. UltraFast automatically detects latency and packet loss and constantly adjust and optimizes to maximize throughput.

How Fuusion Addresses Data Format Flexibility Challenges

Fuusion supports dozens of common data formats, including files with various formats (e.g., XML, JSON, etc.), SQL, NoSQL and many cloud services via either specific “connectors” or custom REST API integrations. Instead of involving DevOps for coding, Fuusion uses a powerful drag and drop visual data flow configuration approach, based originally on Apache NiFi, as a means of quickly configuring data flows and data format flexibility.

Take Action

Schedule a Demo with our Fuusion Team to learn how Fuusion can prepare your organization for the coming edge and data transformation.

Building an Edge Data Platform for Centralized AI/ML Processing

Building an Edge Data Platform for Centralized AI/ML Processing

Buurst Fuusion is a decentralized solution with components running at the edge, on a centralized cloud (AWS, Azure), and a data transfer accelerator in the network. Using a visual data flow tool build on Apache NiFi and data connector templates from the Fuusion Toolbox, customers can rapidly layout a complete data flow for delivery of information to an AI/ML solution for analysis and insight.

Fuusion Chart v3 with ultrafast_Fuusion

To get a better understanding of this new product, Buurst engineering recently recorded a 30 minute podcast on the L8istSh9y podcast community. This recording offers a behind the scenes look at Buurst Fuusion’s technology components: the open source Apache NiFi platform, challenges with edge data usage, and data transfer performance over a wide variety of networks.

A key component of Buurst Fuusion is our patented UltraFast technology designed to overcome significant network latency challenges that will certainly exist in edge deployments. This critical feature unlocks data for processing that would typically be considered to difficult to obtain for data flow processing.

For more information on this new product: