18–20 Jun 2025
Tohoku Univ. Aobayama-campus
Asia/Tokyo timezone

Parallel reconstruction computing on multiple Hygon GPUs for ptychography at HEPS

19 Jun 2025, 16:10
20m
Center Hall 4F Medium conference room

Center Hall 4F Medium conference room

Speaker

LEI WANG (Institute of High Energy Physics)

Description

When we try to move the software named ‘Hepsptycho’, which is a ptychography reconstruction program originally based on multiple Nvidia GPU and MPI techs, to run on the Hygon DCU architectures, we found that the reconstructed object and probe encountered an error while the results running on Nvidia GPUs are correct. We profiled the ePIE al-gorithm using NVIDIA Nsight Systems and Hygon's HIP-compatible profiler (Hipprof). Multiple GPUs will communicat and share with each other the object and probe information after the batch or iteration computation completes as slave GPUs send the reconstructed results back to GPU 0 using the Reduce or AllReduce function. Nvidia CUDA toolkit could successfully execute the communication. Hygon DCU 0 encounters a memory corruption error during synchroni-zation, likely due to race conditions when updating the object/probe buffers. We show the profiling results here and how we repair this bug. Here we also show the computational speedup using other HPC techs to get a better recon-struction performance on multi GPUs. This work is implemented within Institute of High Energy Physics (IHEP) DAISY framework.

Presentation materials