Supercomputing for autonomous 5G networks and protein sequencing

The telecoms and biotech industries are, in a way, at a parallel on their path to innovation, and these paths are inevitably intertwining.

25 Apr 2019

Charles Gibbons was previously the strategy lead for 5G IT & systems at BT & EE and now serves as Chief IT Architect and Head of Engineering at the Francis Crick Institute, working on innovation and biomedical research platforms. At Digital Transformation World next month Charles will be part of a panel of industry leaders discussing the next major breakthroughs in digital transformation. Register now to join him and thousands of your industry peers from across the world for three immersive days of inspiration, innovation and networking.

In my recent career move from telecoms to biotech, I’ve found recognizable similarities in how both industries approach innovation. Research is critical to the development of new solutions and progression is not linear. Furthermore, there is an increasing overlap in the services and solutions both industries are yielding; for example, smart watches capable of monitoring heart rate variability as an accurate predictor of heart function issues.

Scientific innovation as a service

Synthetic biology is a $400 billion a year industry and research is at the very center of its growth. Providing the necessary platforms, skills and expertise allows researchers to concentrate their domain expertise as they innovate. The responsibility of IT in biomedicine is to enable research translation. This includes the IT functions common to any organization, as well as the high-performance computing necessary for DNA sequencing.

The Francis Crick Institute uses cryogenic electron microscopes for freezing cancer molecules to near zero Kelvin to arrest the atomic rotation and enable the image capture. Individual microscopes cost around €20 million and generate datasets in the hundreds of terabytes. To process this data efficiently, we use a data streaming architecture similar to those employed in telecommunications for network event monitoring. Only with a streaming architecture can we efficiently balance the load of data analysis through our machine learning platforms. We currently rely on GPU-based high-performance computing and protein databases leveraging the capabilities of blob and graph based storage. On top of these platforms, we utilize various tools including TensorFlow Estimators and Jupyter Notebooks to process datasets and automate adversarial machine learning scenarios. Providing the appropriate tools and platforms such as these help to enable innovation in the research community, and as mentioned research is essential to the telecommunications industry, for example research into 5G new radio. As physicists researching new radio propagation must rely on the tools of computer science as part of their research, so too must biologists researching new proteins. There are similar processes for managing petabyte scale datasets.

Supercomputers

One such process is supercomputing, an area where biomedicine leads in technology. The world’s currently largest supercomputer (IBM Summit) is a biomedical processing engine. This is because the sheer size of the scope of protein sequencing requires intense computational power. However, as 5G and autonomic networks develop, this level of computing power will be required. What makes biomedicine so complex is that “in nature there is no abstraction”. Amino acids form codons which form proteins, there is no OSI composition of layers.

As connected devices move into the billions, the items connected to the network stretch to 10¹⁰. The number of interconnected network events can be significantly higher, even up to 10¹⁸. These numbers are far greater than the expected number of proteins coded in the human genome at about 3.5 × 10⁴, but importantly are much smaller than the total number of possible protein combinations. There are 1.3 × 10¹³⁰ possible amino acid sequences in proteins being composed of 100 amino acids. We can calculate this because of the original work of Crick and Watson in the discovery the structure of DNA.

Processing the permutations of different sequences of amino acids requires significant supercomputing, which in turn necessitates quantum computing technologies. IBM Summit at the Oak Ridge National Laboratory in Tennessee performs 1.4 x 10¹⁶petaflops per second. Whilst this is an incredible amount of power it is still insufficient for calculating all possible protein combinations. What is required is a quantum computing engine capable of 10¹⁰⁰petaflops per second.

Quantum computing when viable, will surely be a central tenet of a fully autonomous network, capable of managing billions of connected devices. Mission critical systems such as connected cars, will require a distributed computing model with incredible processing but also extremely low latency. The high-performance compute power for such machines could be at the septendecillion (10¹⁰⁰) scale. Researching such challenges is a true motivator for both biomedicine and cutting-edge telecommunications.