Ever pick the shortest checkout line in a crowded grocery store, only to get stuck in limbo behind the lady with 100 coupons? The same issue apparently plagues the flow of high-speed data processing and cloud computing.
Instead of always picking the shortest line, for example, how about choosing whichever has less than five? On a crowded Sunday afternoon, maybe the threshold could be 10, since nearly all the lines are above five. Your odds for getting to checkout faster begin to increase.
This logical decision-making process was assumed as good practice for decades in the scientific community. A graduate student at The Ohio State University, however, became the first to actually prove it mathematically.
In 1993, professors Frank Kelly and Neil Laws conjectured that in order to minimize the waiting time, this threshold should increase logarithmically with the average number of people in the store. Ever since then, engineers, mathematicians, computer scientists, and statisticians have tried to prove it, but unsuccessfully.
Electrical and Computer Engineering (ECE) student Xingyu Zhou finally resolved the open conjecture within his published ACM Sigmetrics/IFIP Performance 2019 paper, “Heavy-traffic Delay Optimality in Pull-based Load Balancing Systems: Necessary and Sufficient Conditions,” co-authored by ECE Adjunct Assistant Professor Jian Tan and ECE Professor Ness Shroff, both TDAI faculty affiliates.
Zhou then applied his results to a much wider range of problems outside the original queueing theory. His new method can now adapt to solve data scheduling problems, exploring how each server handles jobs in different queues, and can help resolve any caching problems hitting different servers.
To explain a typical scenario of the data dilemma, Zhou said, perhaps someone is watching Netflix and sees the new season of House of Cards is finally available. It becomes a free-for-all.
“As a result, many fans binge-watch in order to finish the entire season within a day or two,” Zhou said. “This often causes a very heavy load on the data management system that Netflix is using over a short time. If this scenario is mishandled you might experience a large delay or lag during the most exciting moments of your favorite episode.”
Not only does this negatively affect Netflix, he said, it annoys customers too. Netflix engineers recently started to consider replacing the previous static load data balancing scheme with a new dynamic system, which creates a threshold for load balancing decision making to adaptively change with the load of the data flow in real-time. This scenario is exactly where Zhou’s results are applied.
The grad student further explained how he was able to resolve this important open conjecture.
“Almost all of the previous attempts on this problem adopt a method called ‘diffusion approximations,’ which is a common choice for analyzing load balancing systems,” Zhou said. “However, I made entirely novel extensions on a completely different approach that allowed me to not only resolve the original conjecture, but to offer many useful insights from the first principle that can be utilized to solve a wide range of problems in data centers.”
Article by Ryan Horns, ECE/IMR Communications Specialist. To read the original, click here.