Over the past 8 years, we have seen more than an exabyte of data migrated onto the leading public cloud platforms. This makes public cloud a new center of the business data universe for many enterprises and government agencies today.
Now we see the rapid rise of Edge Computing coming next, where up to 50% of all new data creation will take place over the next several years from the edge. Indeed, we see traditional on-premises storage, server and network vendors turning their attention to becoming the arms providers to fuel the growth of the edge and its IoT cousins, to restore growth and health to their IT infrastructure related businesses, which have suffered due to cloud computing migrations supplanting traditional datacenters.
As we introduced the Buurst Fuusion™ product into the market in 2020, at first, we focused on what came natural to us after helping customers migrate thousands of applications and petabytes of data from on-prem into the leading clouds – moving data into the cloud. What we discovered came as a bit of a surprise at first. Of course, there’s still massive data transfers and migrations to the cloud, but now we see the rise of the edge creating a gravitational pull on the data stored centrally in the clouds to fuel edge computing.
Edge Is Data Hungry
We have heard about edge data creation intensity, fueled largely by IoT sensor data feeding many edge computing systems. What we don’t hear as much about is the fact that these often standalone, headless edge nodes require care and feeding from a central command and control system, which is increasingly hosted in the cloud.
Edge systems require data for:
- Software updates
- Configuration settings
- Inference engine updates
- Code deployments and updates
- Container deployments
Edge computing systems need to be installed, configured, updated and cared for like any other IT system. What are some examples of edge systems that need centralized control?
- Offshore Rigs
- Shipping Vessels
- Military Systems
- Electric Vehicles
- Commercial Drones
And since many of these edge systems exist at remote locations, the networks connecting edge and cloud are often less than ideal. In fact, many remote edge systems must rely upon satellite, radio, DSL, and in the future, 5G networks for their connection with the rest of the world. These edge networks often bring high levels of latency, packet loss and sometimes intermittent connectivity, unlike the pristine, redundant network conditions we see in the cloud and traditional data centers.
To successfully deploy and maintain edge computing systems remotely over challenging high-latency, lossy network conditions between systems using incompatible data types, several solutions are needed:
- Store and forward with guaranteed delivery
- Optimizations to overcome TCP/IP issues with high-latency and packet loss
- Data format flexibility across disparate systems.
Let’s examine each of these data distribution challenges in more detail.
Store and Forward with Guaranteed Delivery
Particularly with satellite networks using mobile access points (e.g., ships at sea, military personnel and equipment), connectivity is intermittent and can be lost entirely for unpredictable periods of time. During network outages, it’s critical that edge systems can queue up data that needs to be transmitted so it isn’t lost. It’s equally important in many cases to ensure guaranteed delivery to maintain transactional integrity.
Optimizations to overcome TCP/IP issues with high-latency and packet loss
It’s common knowledge that TCP/IP falls over from a throughput perspective in the face of network latency and packet loss errors. Latency is commonly introduced by satellite and other radio (e.g., 4G/5G) communications systems. Packet loss exacerbates the effects of latency on TCP/IP by causing TCP’s window size to shrink. The results can be readily seen in the following chart.
It’s easy to see how TCP’s throughput becomes unusable with any appreciable amount of packet loss and latency, limiting the effectiveness of cloud-to-edge data distribution and edge-to-cloud data consolidation efforts in many real-world use cases.
Data Formats – Object Storage, File Storage and SQL/NoSQL from the Clouds
Data is stored in the cloud in many different formats. Object storage is the most common, such as S3 on AWS® and Azure Blobs on Microsoft® Azure cloud. Unstructured data continues to be managed as filesystems, and increasingly by NoSQL databases. Structured data can be found in various SQL databases, as usual.
Different edge devices operate upon and create data in their own proprietary formats, often access via REST API’s.
Some means of data extraction, combining fields of related data and then transforming the data format at the proper place in the edge/cloud continuum is required.
How Fuusion Addresses Data Distribution between Cloud and Edge
Guaranteed Delivery. Fuusion leverages Apache NiFi technology to manage data flows. Data gets queued between each data flow processing block, as well as when sent across the network. Even at very high scale, guaranteed delivery.
UltraFast® Data Acceleration. Fuusion includes a key feature that optimizes end-to-end data transfer over latency/packet lossy networks. It does this by intercepting TCP traffic via a SOCKS proxy, redirecting it through a proprietary UDP channel that ensures reliable delivery and that overcomes the effects of latency and packet loss using patented technology. To contrast the data throughput in the face of latency and packet loss, consider this performance chart that reflects UltraFast throughput.
As we can see from the above chart, throughput remains reasonably constant, at 90% or better, regardless of the network link’s latency and packet loss characteristics. UltraFast automatically detects latency and packet loss and constantly adjust and optimizes to maximize throughput.
How Fuusion Addresses Data Format Flexibility Challenges
Fuusion supports dozens of common data formats, including files with various formats (e.g., XML, JSON, etc.), SQL, NoSQL and many cloud services via either specific “connectors” or custom REST API integrations. Instead of involving DevOps for coding, Fuusion uses a powerful drag and drop visual data flow configuration approach, based originally on Apache NiFi, as a means of quickly configuring data flows and data format flexibility.