Site Sonar — WLCG ALICE Grid Sites Config Monitoring Tool

Kalana Wijethunga
4 min readAug 28, 2020
Google Summer of Code 2020 Project

Worldwide Large Hadron Collider Grid(WLCG) is the largest computer Grid in the world which connects 1000s of computers and many supercomputers together to build a powerful computing Grid. The Grid powers a large number of researches on a day-to-day basis conducted by numerous scientists all around the world. My project for Google Summer of Code 2020 is to build a tool to monitor the configuration of WLCG.

WLCG combines a large number of Grid sites each containing its own small computing Grid. Each of these sites are maintained by different universities, institutes and teams and hence each site has its own configurations.

Eg:- Some sites contain CentOS 7 as its OS while some do Ubuntu or any other OS, Some sites support automatic containerization of jobs while some sites do not etc.

It is very important to know the configuration of each site in the Grid to understand the capabilities of each site which in turn maximizes the effective use of each site. At the beginning of GSoC 2020, there was no easy way to do this. Site Sonar was built as my GSoC project to simplify this process by automating the collection and analysis of configuration of each Grid Site.

Intuition

It is possible to collect information of a single node in a Grid site by executing a job on it which reads and echoes the relevant information of the node.

Eg:- "echo Uname: $(uname -a)" command returns the Kernel information of the node in which the job gets executed on.

Submitting such a job with multiple commands provides us with all the required information about the configuration of that node. By collecting the information of all the nodes in Grid sites like that we can analyze the configuration of each Grid site. The collected result can be used to identify the capabilities of different Grid sites of the WLCG.

How it Works?

Site Sonar consist of a custom script which can be used to collect the information of a node in which it gets executed. Site Sonar submits this script as separate jobs to each Grid sites in the Grid. The results of each job is collected and analyzed to identify the configuration of each Grid site.

As the computer network is abstracted and presented as a single computer to the user, there is no way to bind a job to a given node in a given site. Therefore in reality, there is no way to submit jobs to each node in each Grid site. However, it is possible to bind jobs to a given Grid site. This feature is used in Site Sonar to collect the information of each site.

As there is no way to collect the information of each node in each site, Site Sonar submits a batch of jobs of size equal to twice the number of nodes in the relevant Grid site. It is expected that most of the jobs will get executed on different nodes and having 2*num_nodes amount of jobs will make sure we execute jobs around 90% of the nodes of each site. If majority of the nodes in that site support say Feature A , we conclude that the considered site supports Feature A.

Data Collection

MonAlisa is a a special tool built to monitor the WLCG sites. Site nodes continuously submit their data to MonAlisa for monitoring the node data. In the jobs we submit to the Grid, we use ApMon library to execute the command and publish the result of the command to MonAlisa. A MonAlisa client listens to the Site Sonar topic in MonAlisa and if that topic receives an update, injects that data to the database in a formatted way. The formatted data is then used in a website and a CLI tool to present the collected information to the users.

Site Sonar Architecture

Data Representation

The users have the ability to query the collected results using a website and the Site Sonar CLI too. It is possible search what Sites support a certain feature and what nodes in a given site support a certain feature using this tool.

Site Sonar Website Screenshots
Site Sonar CLI Screenshots

--

--

Kalana Wijethunga

Software Engineer @WSO2 @CERN| GSoC Participant | @UOM Grad| Computer Science and Engineering