Deep Learning software for Supercomputers

Background and Motivation

In some of our collaborative projects, we apply Deep Learning to problems in Computer Vision, e.g., segmentation of MRI scans, micrographs, etc. Dataset sizes as well as the complexity of neural network models used in these projects demanded powerful computer resources for both training and inference. We therefore decided to use our supercomputers for training and inference.

In order to utilize supercomputers, the software stack on these machines has to be kept up to date to support the various Deep Learning frameworks. Also, effective utilization of supercomputing resources requires the possibility of distributed training.

This project brings together activities intended to simplify the use of supercomputing resources for distributed Deep Learning.

Our contributions

In collaboration with other colleagues from the JSC, we have been involved in coordinating the system-wide installation and maintenance of Deep Learning frameworks on our production supercomputers, i.e., JUWELS and JURECA. Additionally, we have made significant contributions to improving the software stack on JURON (one of the pilot systems for the Human Brain Project). This has not only made the machine well suited for distributed training and inference for Deep Learning applications, it has generally made it more accessible to users.

Our collaboration partners

Following is a list of our collaborators:

Simlab Contact

  • Institute for Advanced Simulation (IAS)
  • Jülich Supercomputing Centre (JSC)
Building 16.3 /
Room 218
+49 2461/61-85277
E-Mail

Last Modified: 28.06.2022