Site icon TransREporter

Al Infrastructure ‘Planet-Scale’ Packing 100,000-Plus GPUs by Microsoft.


Microsoft the best software considered globally is launching its plant scale Al infrastructure with its grand features. A recently published paper on Singularity gives a brief on its impact and features, which is the latest as this led by Azure Al service.

Microsoft got discovered in 1975, the current CEO is Satya Nadella. Microsoft Azure is working on building a new Al infrastructure service which is called ‘Singularity’. The workers who are working on this new Al Platform published a paper and they have provided detailed information.

This new platform is described as a service that will become a major driver for Al both inside Microsoft and outside. It allows hundreds and thousands of GPUs and Al accelerators to function together. Every device in infrastructure service is treated as a cluster, but it also reveals the full potential of all devices and ensures no wastage of resources.

 The researcher of Singularity mentioned its workload-aware scheduler can transparently occupy and elastically deeps learning workloads which drive high utilization that neither impact their performance or correctness across the global speedy of accelerators. The published paper focuses on the scheduler, does offer some figures to describe the system’s architecture. Analysis on Singularity performance mentions a run test on Nvidia DGX-2 servers, planning to use a Xeno Platinum 8168 two sockets of each 20 Crores. Eight V100 Model GPUs each server holds the Ram of 692GB and it network over IndiniBand. FPGAs and other accelerators Microsoft have tens of thousands of servers.


Microsoft Azure is working on building a new Al infrastructure service called ‘Singularity’.

The published paper focuses on Singularity scaling tech and schedulers, it declares the secret sauce because of reducing cost and increasing reliability. Software detaches jobs from accelerator resources, which means they simply change the number of devices the workers get mapped; it is transparent to the entire users. The workers of the job remain the same and physical devices runs the job.

In 2018, Microsoft launched its “Project Brainwave” work it was developed to provide fast Al processing in Azure. In the previous period, Microsoft previewed Azure Machine Learning Hardware Model by the Brainwave in the cloud. The first step in making FPGA is to do processing for Al workloads and make it available to customers.

Microsoft invested around $1billion in OpenAl in 2019, it has announced a year later to build the fifth most powerful public recorded supercomputer which is collaborating with OpenAl. Microsoft AI supercomputer has built is exclusively for OpenAI Microsoft officials, planning to make the company large AI models and Azure AI service and GitHub is creating training optimization tools. Microsoft makes numerous accelerators and services as its under Azure AI banners. Especially to those customers who do not require a dedicated supercomputer.

AI supercomputer line-up with 80GB NVIDIA A100 GPUs in Azure.

Microsoft in November of last year announced an AI supercomputer line-up with 80GB NVIDIA A100 GPUs in Azure. Singularity was microkernel operating system and its set of related tools and libraries developed in managing code. Singularity made a huge impact and influence on operating systems which includes Helios, Midori, Drawbridge, and Barrelfish.

The paper mentions about Singularity achieve a significant breakthrough in scheduling deep learning workloads, converting niche features such as elasticity into mainstream and features that the scheduler can rely on for implementing stringent SLAs.

Exit mobile version