How UltraFast Uses Reinforcement Learning to Tackle Tough Network Conditions

How UltraFast Uses Reinforcement Learning to Tackle Tough Network Conditions

Latency and packet loss over wide area networks, the Internet and RF-based network devices (e.g., satellite, cellular, packet radio) has long been a barrier to large scale data transfers. TCP/IP’s windowing algorithm is infamous for reacting very poorly to packet loss, which reduces the amount of data TCP is willing to send per transaction, making it extremely slow yet reliable. Today, large amounts of data increasingly needs to be transferred from where it gets created to elsewhere across available networks to where it is consumed and used. Sometimes this data is purely for disaster recovery and backup, other times its for important analytics and other business processes. Edge computing promises to address some of these issues by moving workloads closer to the point of data creation, but even then, data must often be transferred to centralized locations (data centers, public clouds, SaaS services) to make use of the insights gained across many edge sites.

Buurst Fuusion’s UltraFast® Machine Learning Approach

Over the years, many different types of algorithms have been devised to try and address this network throughput optimization problem. Buurst’s Fuusion product includes a feature called UltraFast®, which overcomes the challenges posed by TCP over highly latent or lossy networks in a unique way. As we will see in this post, UltraFast utilizes a type of AI/ML technology to learn, adapt and optimize data transfers over troublesome network conditions.

The UltraFast Gambler Agent

UltraFast uses a machine learning process that uses a set of “gamblers,” or data transmission experiments, that place different “bets” on what the ideal transmission rate will be. There is no model of the network available ahead of the agent running its own experiments to learn about the network.

The main goals are to:

  1. Maximize network throughput by sending as much data as possible
  2. Avoid creating packet loss due to sending data too quickly
  3. Detect when external factors, such as other network participants, changing IP routes, and other dynamic conditions are causing congestion or interfering with packet throughput and use this information to place improved bets.

The Agent creates a set of “Gambler” processes, each running in an independent thread. Gamblers are given a “Data Transmission Bet” to place, said bet being its data transmission “rate”; i.e., a bet is the time delay between sending each packet. The data is sent to a connection at the distant end of the network, and several types of responses may occur: 1. an ACK indicating good data receipt, 2. a NAK indicating bad data receipt, or 3. no response at all, indicating a lost packet (timeout). Each Gambler process sends several data packets and records the overall success rate, i.e., how many packets were sent, how many succeeded, and how many failed. Upon completion of the Gamblers’ processing, each Gambler is assigned an overall score. The more acknowledged and successful data packets sent, the higher the score. The more NAKs or timeouts (packet losses) present, the lower the score. As we will soon see in more detail, the Agent then uses these Gambler scores to reward successful gamblers, which are allowed to “breed” and multiply during the next generation or experiment cycle. Less successful or failed Gamblers are pruned and eliminated. This process is similar to natural selection, where the strong and successful survive, and the weak and unsuccessful do not propagate.

UltraFast Reinforcement Learning Process

The UltraFast Reinforcement Learning Process The chart below depicts the UltraFast learning cycle and each step in the process.

The UltraFast learning loop runs repeatedly, processing these steps:

  1. A Monte Carlo derived genetic algorithm generates random strategies for the initial set of gamblers, then subsequently breeds new gamblers based upon last cycle’s winners’ results.
  2. A new generation of dozens of gamblers is created at the start of each cycle, each with its own rate of sending data packets.
  3. Gamblers send their data packets, measuring ACKs, NAKs, and lost packets.
  4. Each gambler’s win/loss rate is scored – more packets sent equals a higher score, lost packets or data transmission errors (NAKs) penalize the score.
  5. Each gambler’s loss-rate is compared with the current loss-zero (separately established with regularly timed packets).
  6. Winning gamblers showing the best results are rewarded by being bred, resulting in similar successful gamblers for the next cycle. The agent prunes the losers and feeds the learned results forward into the genetic algorithm. In addition to the ‘successful’ newly created gamblers, new random variants are added to further explore the newly defined boundaries, enabling the system to adapt to changing network conditions.

The above 6-step process runs continually, optimizing data throughput while minimizing packet loss and congestion, and adapting to the constantly changing and evolving complex network environment. Reinforcement learning enables UltraFast to adapt to each unique network topology and navigate its evolving traffic and routing conditions.

UltraFast Speed Test

UltraFast includes a speed test feature, which sends “iperf” data through the UltraFast optimizer as both a download stream and then an upload test. This is analogous to your typical Internet or broadband speed test, except it uses UltraFast technology to compare the throughput results vs. plain TCP/IP. In the following screenshot, we see the TCP results displayed in red (mostly hidden behind the blue UltraFast chart). The link being tested is on AWS between the Ohio region in the USA and the Capetown region in South Africa. The latency averages around 250 milliseconds round trip time, with little to no packet loss.

TCP/IP averages just 144 Mbps over this moderate-latency 1 Gbps link. We can see during the initial Download Speed test, the blue (cyan) UltraFast chart slowly increase its throughput over time, as the gamblers run and the reinforcement learning algorithm actually learns the particular characteristics of this network. Once UltraFast learns the network, it eventually is able to peg the network at near 1 Gbps at times. Then the upload test starts. Since UltraFast has already learned this network, it immediately optimizes the throughput, averaging 822 Mbps vs. TCP’s 144 Mbps.

As network conditions vary over time, UltraFast’s intelligent learning algorithm continues to observe, adapt and learn in order to continually optimize network throughput. This is very important for long-running bulk data transfer jobs in the terabytes or more size. As these long-running data transfer jobs occupy large amounts of network bandwidth over time, they are much more likely to experience competing traffic at different times of day; e.g., backup jobs running overnight, user downloads during daytime hours, and many other variables, including network routes changing the underlying network characteristics over time.

Summary

Optimizing data throughput over challenging network conditions is an age old problem – one that now has a new type of solution that uses reinforcement learning to intelligently optimize and constantly learn and adapt to changing network conditions. To learn more about the UltraFast feature of Fuusion and how it addresses challenging, high-latency and lossy network conditions, please visit the Fuusion page for more information. For more detailed insights into UltraFast, its machine learning technology and overall architecture, you can download the UltraFast technical white paper.

Fuusion Use Case: Real-Time Local Edge Processing 

Fuusion Use Case: Real-Time Local Edge Processing 

Fuusion Use Case: Real-Time Local Edge Processing 

As the owner of a mid-size manufacturing company, you are being challenged daily with how to cohesively manage your data across all your production sites.  You are wanting to analyze data at each site to improve productivity at each site, but also need to track long-term trends to strategically guide your business in the future. 

You have dozens of sites spread through North America from one coast to the other and everywhere in-between. You’re doing brisk business, but you know it could be better. You just need to tap into the data your sites are generating to get to the next level. 

That data could help you to improve sales, figure out where supply chain issues affect production, and to pinpoint inefficiencies at production site levels. Having easier access to real-time (or near real-time) data and historical batch analytics will help to find and address the gaps and issues that are holding you back. 

Generating the data isn’t the problem. You already have terabytes of data being created daily, but it’s just sitting idle for far too long. You need to aggregate and store the data so you can easily get to it when you need it. Additionally, you need to process the data into a format that you can easily analyze with the company software you already have and with cloud services. 

Yes, processes are in place for individual sites, and some analysis has already been done on a one-off basis. But you need an automated process that will allow you to centrally manage all your data from a single machine data point up to the entire company production view. Also, data formats cobbled together for only specific locations don’t help out the bigger picture and slow everything down. You need source data that has not been manipulated so you can provide an even, unbiased account of your business. 

You need consistency across every single site. And you need to process that data automatically in real time. Aggregating it quarterly, like it’s done now, isn’t doing you any good. It seems impossible, or at the least, at an exorbitant cost. You want to analyze data right now across all your sites, without headaches, to improve day to day operations and increase productivity. 

It’s time to start laying the correct groundwork. Define the datasets you need. Nail down the file formats that your company software can use. Start tracking long term trends to strategically guide future business decisions.  This is where Fuusion comes in. 

Fuusion can help you achieve your goals of managing your data across multiple sites.  Plus, it can help you route your data to cloud services so you can run long-term analytics to understand your business trends to ensure future success. 

So how does Fuusion work?  Fuusion connects and ingests data from all your machinery and ERP at each site, no matter where it is located. Not only will it connect and ingest the data, but it can perform pre-defined operations to format and prepare the data for delivery, as well as defining the locations it will go.  Next, Fuusion can process your data locally, in pre-defined formats, and boost the speed at which the data arrives centrally. Your data is then processed in the desired formats, integrated with popular AI/ML frameworks for low-latency inferencing. Lastly, the processed data is routed to the cloud, where it can be aggregated with cloud services so you can compare it to historical data and observe trends from long-term analysis. With the right plan of attack, and the right tools, even the impossible becomes possible! 

Building an Edge Data Platform for Centralized AI/ML Processing

Building an Edge Data Platform for Centralized AI/ML Processing

Buurst Fuusion is a decentralized solution with components running at the edge, on a centralized cloud (AWS, Azure), and a data transfer accelerator in the network. Using a visual data flow tool build on Apache NiFi and data connector templates from the Fuusion Toolbox, customers can rapidly layout a complete data flow for delivery of information to an AI/ML solution for analysis and insight.

Fuusion Chart v3 with ultrafast_Fuusion

To get a better understanding of this new product, Buurst engineering recently recorded a 30 minute podcast on the L8istSh9y podcast community. This recording offers a behind the scenes look at Buurst Fuusion’s technology components: the open source Apache NiFi platform, challenges with edge data usage, and data transfer performance over a wide variety of networks.

A key component of Buurst Fuusion is our patented UltraFast technology designed to overcome significant network latency challenges that will certainly exist in edge deployments. This critical feature unlocks data for processing that would typically be considered to difficult to obtain for data flow processing.

For more information on this new product: