BEATS tests computing infrastructure for big data processing

News
28 Apr 2022

Synchrotron X-ray computed tomography works in a similar way to the well-known medical scanner, except that synchrotron X-rays are much more intense than hospital X-rays and can be up to 100 billion times brighter. The images that result are of very high quality and contrast, enabling scientists to study samples with a resolution of several micrometres. These beautiful and extremely detailed images come at the cost of high data generation, with a single experiment often producing up to 2 Terabytes of information in 8 hours.

To cope with such massive data production, the tomography beamline BEATS requires computational and network resources on the scale of these ambitions. Users need to obtain fully processed data sets, for example reconstructed 3D volumes of their data, as soon as the experiment is complete. For this, the facility has to be able to store and process a huge amount of raw data in a short space of time, sometimes within minutes or hours.

“Experimental data management is a big issue at synchrotron facilities and especially for micro-tomography beamlines”, says Mustafa Alzu’bi, computing system engineer at SESAME working on the data acquisition and management systems for BEATS. “Here, we are lucky to be able to establish our data policy before collecting and storing the big data. Many synchrotrons have had to implement data policies while managing in parallel exponential increases in data.”

 

2 TB of data represents approximately 1000 hours of high-resolution film or 620 000 photos. The BEATS infrastructure has parallel file system storage with 0.5 PB effective storage capacity configured with GPFS file system as single file system. Storage performance is at-least sustained 5 GB/s aggregated (read and write).

In spring 2021, BEATS launched a call for tender for the supply, installation and commissioning of a centralised parallel file system and CPU/GPU cluster. In September 2021, the tender board awarded the contract to Jordanian company General Computers & Electronics Co. The hardware arrived at SESAME in February 2022 and site acceptance tests validated all the specifications in March. Today, the BEATS team from the SESAME computing group is setting up the software needed for BEATS image reconstruction and analysis. The next step will be to integrate all cameras and other sub-systems together so they are ready to serve day one experiments.

The BEATS data acquisition, processing and storage infrastructure is composed of hardware equipment installed in the BEATS control hutch and/or the SESAME Data Center.

BEATS WP7 – data analysis and management – elaborated the design of the computing infrastructure for BEATS under the leadership of Charalambos Chrysostomou (The Cyprus Institute). The team is comprised of Abdalla Ahmad, Mustafa Alzu’bi, Gianluca Iori and Salman Matalgah.

The data pipeline is made up of the data acquisition systems, beamline control station, hybrid CPU/GPU reconstruction cluster, short-term storage server, data analysis and visualization workstation and network connection components.