Intro to Supercomputers and my first experience with the World’s Largest Computing Grid

Kalana Wijethunga
4 min readMay 17, 2020

I recently got selected to work under the CERN organization for Google Summer of Code 2020 program. CERN is the European Organization for Nuclear Research which owns the world’s largest and most powerful particle accelerator — The Large Hadron Collider(LHC). This prodigious volume of data created by the LHC is handled by CERN’s Worldwide LHC Computing Grid(WLCG) which incorporates over 170 computing centers in 42 countries, as of 2017. This is also considered as the World’s largest computing grid. This article gives an introduction to supercomputers and shows you how to submit a job to the WLCG, in case you are a user.

Source: https://resize.hswstatic.com/w_907/gif/what-is-the-worlds-fastest-supercomputer-used-for.jpg

What is a Supercomputer?

If you are a student related to an IT field, you must have heard the word supercomputer numerous time without having an exact idea of what it is. You might think it is a single computer built with the most sophisticated hardware — at least that is what I thought at the beginning. Well, the truth is a bit different from that.

A supercomputer is built by incorporating a large number of computers together to act as a single high-performance computer. The reality is if you have a lot of money, you can go to a shop, buy as many computers as possible and build a mini-supercomputer by yourself. That is the basic theory behind a supercomputer. However, when building an actual supercomputer, a special care is taken to ensure factors like high-speed communication, low latency, prevent overheating, etc. rather than just combining a bunch of computers. Until the advent of Grid computing, most of the time a large number of homogeneous high-performance computers were combined together to build a supercomputer.

What is Grid Computing?

Grid Computing can be seen as a type of supercomputer. Creating a supercomputer using a bunch of high-performance computers can get really expensive and Grid computing is a solution for this. A Computing Grid is a widely distributed system that incorporates group of computers to act as one computer. Sounds a lot like the definition of the supercomputer right? The difference is that a Computing Grid is built by incorporating a huge number of general computers(which we use day-to-day) instead of special high-performance computers. A Computing Grid makes the building of a supercomputer much economical and therefore the users get the chance to use a supercomputer as a much lesser cost.

Worldwide Large Hadron Collider Grid(WLCG)

WLCG is the world’s largest Computing Grid which combines computing resources in 42+ countries. My GSoC project is on automating the information retrieval from all the grid nodes in the WLCG. WLCG consists of a huge number of computers from many universities and institutes in the above countries. A single computer on this Grid is referred to as a node. There are 2 types of nodes in WLCG.

  • Computing Elements — Specially designed for doing experimental computations
  • Storage Elements — Specially designed for transferring data between nodes and storing data

My Experience with the WLCG

I work under the ALICE experiment, which is one of the 8 detector experiments in the Large Hadron Collider at CERN. JAliEn (Java ALICE Environment Grid Framework) is an open source Grid Framework that consititues production environment simulation, reconstruction, and analysis of physics data of the ALICE Experiment.

Job Submission Architecture of WLCG

How to run a job on WLCG —

To submit a job, you need to be a member of the CERN organization(It is possible to access the Grid from other experiments as well. As I registered as an ALICE user, I am explaining my experience). Once you become a member, you get access to a remote machine. This machine is referred as lxplus machine and it kind of a personal computer for you in the CERN organization. You can access this machine using SSH. You can copy your job descriptions, dependencies from your local machine to your lxplus machine or define them directly inside it.

You will need a valid public certificate and private key given to you by the CERN organization to access the Grid. The certificates should be copied to the~/.globus directory in the lxplus machine in order for you to be able to access the Grid.

Once you have all the files in place you can use the following command to access a Grid shell.

> alienv enter JAliEn
> jalien

This will spin up a Grid shell and which you can use to submit jobs, query jobs, retrieve results etc. To submit a job to the Grid, you will first need to copy the job description file(JDL) from your lxplus machine to the Grid. You can do this by cp file://path/to/job/first_job.jdl /path/on/the/grid . If your jdl contain any executables, you will have to copy them to the /bin directory on the Grid. Once everything is set you can submit the job for execution using the command submit first_job.jdl . You will get a success message with your job id. You can use MonALISA website to check the status of your job. When the job finishes its execution a new file with the results will be generated in you Grid machine’s $HOME if you haven’t explicitly set the output directory.

--

--

Kalana Wijethunga

Software Engineer @WSO2 @CERN| GSoC Participant | @UOM Grad| Computer Science and Engineering