Are Distributed Computing Systems Like [email protected] Becoming Redundant?
In high school, I remember connecting my overheating brick of a laptop to the school’s network and watching an animation of a protein dance across the monitor. [email protected] allowed anyone with a computer to participate in actual scientific research, a revolutionary concept at the turn of the millennium. Fast forward to 2023, and despite more people owning a computer than ever before, such distributed computing systems seem to be on the verge of redundancy.
23 Years of [email protected]: A Review
Protein Folding in Medical Research
Proteins are combinations of linked amino acids (there are 20 naturally occurring ones) that carry out essential biological functions that keep us alive and functioning. Protein function is linked to its shape; the amino acids in a chain attract and repel each other, so proteins take on a very specific structure. This process is known as folding.
If you leave proteins alone in a physiological environment, they will naturally ‘fold’ into their favored structure. This information helps researchers predict how altering proteins can improve or alter their function. However, predicting how a particular combination of amino acids might fold is trickier than it sounds.
There are many individual interactions between the atoms in a protein, with each interaction contributing to its final, overall shape. These interactions must be individually calculated and then pieced together if we want to accurately model protein folding, resulting in a computationally intensive process.
An Open (Source) Invitation to Contribute to Science
While such computational modeling is far beyond our personal computers’ capabilities, a Stanford team decided to run an experiment in 2000 to see if a volunteer-run network could work together to solve these protein folding puzzles.
This project was aptly named [email protected] and would become the world’s most powerful distributed computing network, with a combined processing power rivaling existing supercomputers.
Volunteers who want to contribute their computing power must download a client through with it can communicate with the [email protected] servers. Each contributor receives a set of files with instructions to perform several calculations, known as a ‘job’. Once the job is completed, the contributor sends the results back to the server, which is collated with the results from other volunteers.
A unique feature of [email protected] is its ability to save or create checkpoints in the calculations as, depending on the volunteer’s system, each job could take days or even weeks to complete. On top of this, home computers are also known to shut down or crash intermittently, or the [email protected] client is paused so that the user can free up processing power for other tasks.
This checkpointing allows the computed results to be saved before completion so that the calculations can continue when the user reopens the client.
Other Distributed Computing Networks
Although [email protected] is the most powerful and well-known distributed computing network (and the most powerful), other systems exist that utilize the collective processing power of volunteer-run computers to perform research. These include:
- [email protected]: Analyzes radio signals in the search for extraterrestrial life
- [email protected]: Helps physicists working on the Large Hadron Collider (LHC) with extensive calculations
- World Community Grid: Computing platform for research projects that benefit humanity
- Climateprediction.net: Climate modeling projects like predicting global warming
- Roset[email protected]: Simulates protein folding, design and molecular docking
More People, More Power
Today, 50% of households worldwide own at least one computer; this percentage goes up to 80% in developed countries. This means more people interested in projects like [email protected] can contribute. Typically, such calculations are suited to graphics processing units (GPUs), a computer component meant for gaming but are equally adept at number-crunching.
Processing power is measured in Floating-point Operations Per Second, or FLOPS. FLOPS are mathematical operations (such as addition and subtraction) that a computer can process each second. Consumer GPUs can process trillions of such operations each second, with every trillion FLOPS being called a teraflop.
Supercomputers are thousands of times faster than our home computers at performing such calculations; we need to use petaflops to describe their speed, with every petaflop equal to 1000 teraflops. (1 teraflop = 0.001 petaflops)
During the height of the COVID-19 pandemic, people stuck at home contributed their processing power to simulations that were helping researchers understand the virus better. This caused a massive spike in the power of [email protected], reaching an unprecedented 470 petaflops! At the time, this was almost twice the speed of the world’s fastest supercomputer, Summit.
Is the End Near for Distributed Computing Systems?
Comparing [email protected] to Supercomputers (2000 to 2022)
Despite the positive impact of [email protected] and other distributed computing systems, they might be outgrowing their usefulness. Foldi[email protected] started in 2000, quickly gaining enough contributors to surpass the processing power of the world’s fastest supercomputers.
For reference, the top 10 supercomputers in the year 2000 are shown in the table below:
|Rank||Name||Max Processing Power||Country|
|1||ASCI White||0.0072 petaflops||United States|
|2||ASCI Red||0.0060 petaflops||United States|
|3||NEC SX-3||0.0039 petaflops||Japan|
|4||Earth Simulator||0.0038 petaflops||Japan|
|5||ASCI Blue Pacific||0.0026 petaflops||United States|
|6||IBM SP||0.0020 petaflops||United States|
|7||CRAY T3E-900||0.0018 petaflops||United States|
|8||SGI Origin2000||0.0017 petaflops||United States|
|9||SGI Onyx2||0.0013 petaflops||United States|
|10||CRAY T3E-1200E||0.0012 petaflops||United States|
Yes, supercomputers in 2000 were much slower than an average computer today. Thanks, technology! The rapid pace at which processing power develops quickly made these supercomputers obsolete. At the time when [email protected] broke the 1 petaflop barrier in 2007, the world’s fastest supercomputer was IBM’s Bluegene (0.28 petaflops).
However, [email protected]’s processing power superiority has been usurped in recent years. Disregarding the uptick in contributors during the COVID pandemic, the network usually boasts a processing power of around 200 petaflops.
For comparison, the top 10 supercomputers in 2022:
|Rank||Name||Max Processing Power||Country|
|1||Frontier||1102 petaflops||United States|
|3||LUMI||309 petaflops||Finland (EU)|
|4||Leonardo||175 petaflops||Italy (EU)|
|5||Summit||149 petaflops||United States|
|6||Sierra||95 petaflops||United States|
|7||Sunway TaihuLight||93 petaflops||China|
|8||Perlmutter||71 petaflops||United States|
|9||Selene||63 petaflops||United States|
While still quite comfortable sitting amongst the top 10, it is still far behind the Frontier supercomputer at a whopping 1102 petaflops!
The Artificial Intelligence Dilemma
[email protected] was designed to be a number-crunching system, with simple instructions sent to users’ computers that returned numbers corresponding to an accurate protein simulation. This “blind” approach yields many results, but not all the data is helpful for understanding how a protein folds and functions.
This process can be streamlined with the rise of artificial intelligence (AI) and related tools. For example, AI tools can tell us which part of the protein might be of interest and which parts are unlikely to interact, allowing us to allocate our resources more efficiently. Already, such AI tools are being used to discover new drugs and synthesis pathways.
While AI programs are relatively easy to install on a single supercomputer, they can lead to problems with disseminating this decision information and then distributing the workload when hundreds of non-identical contributors are involved.
As more sophisticated AI tools like unsupervised learning and neural networks become commonplace in various fields, we might see single supercomputers performing such tasks more efficiently than distributed computing systems with higher raw processing output.
Energy and Opportunity Costs
While more and more people now own powerful home computers with advanced GPUs, we have not seen the same growth rate in contributed processing power. This might be due to rising energy costs.
With the current uncertainty in the energy climate causing the price of electricity to rise, ordinary users may not want to leave their power-hungry GPUs on overdrive to support such distributed computing projects.
Furthermore, a new opportunity cost is in play, with cryptocurrency “mining” now prevalent. It is possible to contribute to a decentralized network using home GPUs and CPUs (central processing units), earning rewards in the form of cryptocurrency tokens in the process. These tokens have real monetary value, incentivizing users to allocate their processing power to the network.
[email protected] and other volunteer-driven distributed computing networks often do not have such incentives, relying on altruistic behavior and a sense of purpose to draw in contributors. Combined with other factors in play, could this slow decline signal the end of such networks?
About the Author
Sean is a consultant for clients in the pharmaceutical industry and is an associate lecturer at La Trobe University, where unfortunate undergrads are subject to his ramblings on chemistry and pharmacology.