How UltraFast Uses Reinforcement Learning to Tackle Tough Network Conditions
Rick Braddy
May 21, 2021

Latency and packet loss over wide area networks, the Internet and RF-based network devices (e.g., satellite, cellular, packet radio) has long been a barrier to large scale data transfers. TCP/IP’s windowing algorithm is infamous for reacting very poorly to packet loss, which reduces the amount of data TCP is willing to send per transaction, making it extremely slow yet reliable. Today, large amounts of data increasingly needs to be transferred from where it gets created to elsewhere across available networks to where it is consumed and used. Sometimes this data is purely for disaster recovery and backup, other times its for important analytics and other business processes. Edge computing promises to address some of these issues by moving workloads closer to the point of data creation, but even then, data must often be transferred to centralized locations (data centers, public clouds, SaaS services) to make use of the insights gained across many edge sites.

Buurst Fuusion’s UltraFast® Machine Learning Approach

Over the years, many different types of algorithms have been devised to try and address this network throughput optimization problem. Buurst’s Fuusion product includes a feature called UltraFast®, which overcomes the challenges posed by TCP over highly latent or lossy networks in a unique way. As we will see in this post, UltraFast utilizes a type of AI/ML technology to learn, adapt and optimize data transfers over troublesome network conditions.

The UltraFast Gambler Agent

UltraFast uses a machine learning process that uses a set of “gamblers,” or data transmission experiments, that place different “bets” on what the ideal transmission rate will be. There is no model of the network available ahead of the agent running its own experiments to learn about the network.

The main goals are to:

  1. Maximize network throughput by sending as much data as possible
  2. Avoid creating packet loss due to sending data too quickly
  3. Detect when external factors, such as other network participants, changing IP routes, and other dynamic conditions are causing congestion or interfering with packet throughput and use this information to place improved bets.

The Agent creates a set of “Gambler” processes, each running in an independent thread. Gamblers are given a “Data Transmission Bet” to place, said bet being its data transmission “rate”; i.e., a bet is the time delay between sending each packet. The data is sent to a connection at the distant end of the network, and several types of responses may occur: 1. an ACK indicating good data receipt, 2. a NAK indicating bad data receipt, or 3. no response at all, indicating a lost packet (timeout). Each Gambler process sends several data packets and records the overall success rate, i.e., how many packets were sent, how many succeeded, and how many failed. Upon completion of the Gamblers’ processing, each Gambler is assigned an overall score. The more acknowledged and successful data packets sent, the higher the score. The more NAKs or timeouts (packet losses) present, the lower the score. As we will soon see in more detail, the Agent then uses these Gambler scores to reward successful gamblers, which are allowed to “breed” and multiply during the next generation or experiment cycle. Less successful or failed Gamblers are pruned and eliminated. This process is similar to natural selection, where the strong and successful survive, and the weak and unsuccessful do not propagate.

UltraFast Reinforcement Learning Process

The UltraFast Reinforcement Learning Process The chart below depicts the UltraFast learning cycle and each step in the process.

The UltraFast learning loop runs repeatedly, processing these steps:

  1. A Monte Carlo derived genetic algorithm generates random strategies for the initial set of gamblers, then subsequently breeds new gamblers based upon last cycle’s winners’ results.
  2. A new generation of dozens of gamblers is created at the start of each cycle, each with its own rate of sending data packets.
  3. Gamblers send their data packets, measuring ACKs, NAKs, and lost packets.
  4. Each gambler’s win/loss rate is scored – more packets sent equals a higher score, lost packets or data transmission errors (NAKs) penalize the score.
  5. Each gambler’s loss-rate is compared with the current loss-zero (separately established with regularly timed packets).
  6. Winning gamblers showing the best results are rewarded by being bred, resulting in similar successful gamblers for the next cycle. The agent prunes the losers and feeds the learned results forward into the genetic algorithm. In addition to the ‘successful’ newly created gamblers, new random variants are added to further explore the newly defined boundaries, enabling the system to adapt to changing network conditions.

The above 6-step process runs continually, optimizing data throughput while minimizing packet loss and congestion, and adapting to the constantly changing and evolving complex network environment. Reinforcement learning enables UltraFast to adapt to each unique network topology and navigate its evolving traffic and routing conditions.

UltraFast Speed Test

UltraFast includes a speed test feature, which sends “iperf” data through the UltraFast optimizer as both a download stream and then an upload test. This is analogous to your typical Internet or broadband speed test, except it uses UltraFast technology to compare the throughput results vs. plain TCP/IP. In the following screenshot, we see the TCP results displayed in red (mostly hidden behind the blue UltraFast chart). The link being tested is on AWS between the Ohio region in the USA and the Capetown region in South Africa. The latency averages around 250 milliseconds round trip time, with little to no packet loss.

TCP/IP averages just 144 Mbps over this moderate-latency 1 Gbps link. We can see during the initial Download Speed test, the blue (cyan) UltraFast chart slowly increase its throughput over time, as the gamblers run and the reinforcement learning algorithm actually learns the particular characteristics of this network. Once UltraFast learns the network, it eventually is able to peg the network at near 1 Gbps at times. Then the upload test starts. Since UltraFast has already learned this network, it immediately optimizes the throughput, averaging 822 Mbps vs. TCP’s 144 Mbps.

As network conditions vary over time, UltraFast’s intelligent learning algorithm continues to observe, adapt and learn in order to continually optimize network throughput. This is very important for long-running bulk data transfer jobs in the terabytes or more size. As these long-running data transfer jobs occupy large amounts of network bandwidth over time, they are much more likely to experience competing traffic at different times of day; e.g., backup jobs running overnight, user downloads during daytime hours, and many other variables, including network routes changing the underlying network characteristics over time.


Optimizing data throughput over challenging network conditions is an age old problem – one that now has a new type of solution that uses reinforcement learning to intelligently optimize and constantly learn and adapt to changing network conditions. To learn more about the UltraFast feature of Fuusion and how it addresses challenging, high-latency and lossy network conditions, please visit the Fuusion page for more information. For more detailed insights into UltraFast, its machine learning technology and overall architecture, you can download the UltraFast technical white paper.

Subscribe to Buurst Monthly Newsletter 

More from Buurst

Try SoftNAS at No Cost