D\_RD\_27: Study of modern FPGA device and associated new technology, and search for possible application in High Energy experiments



#### Yun-Tsung Lai

on behalf of the D\_RD\_27 group

**KEK IPNS** 

ytlai@post.kek.jp





2023 Joint Workshop of TYL/FJPPL and FKPPL

@ Ochanomizu University

11<sup>th</sup> May, 2023



Laboratoire de Physique des 2 Infinis







# Application of FPGA in HEP experiments

• Here we use Belle II Central Drift Chamber (CDC) as an example.



# Application of FPGA in HEP experiments (cont'd)



- Non-Return-to-Zero (NRZ). •
- Different encoding based on • protocol design purposes. e.g. 8B/10B and 64B/66B.
  - <10 Gbps for DAQ.</li>
  - <25 Gbps for TRG.</li>

- Strong **FPGA devices** with:
  - Larger number of cells.
  - Larger data bandwidth.

are critical for the usage in:

- **TRG**: complicated algorithm implementation.
- **DAQ**: collect and process large • data.

- **FPGA** server transmission:
  - Data transmission and system slow control.
  - GbE, PCI-Express, VME, etc.
  - PCI-Express is the most popular one nowadays: PCIe40(CRU) in ALICE, LHCb, and Belle II.

### L1 Trigger system (TRG)

- For TRG purpose, complicated algorithm is implemented to process detector raw data in real-time.
- Larger number of cells: improve the logic itself, resolution of triggering, and reduce the background rate.



# **DAQ** system

Readout: • PCI-Express has been the most popular solution for FPGA  $\rightarrow$  server interface.





#### PCIe40: PCIe Gen3





2023/05/11

#### 2023/05/11

#### Yun-Tsung Lai (KEK IPNS) @ 2023 TYL/FJPPL and FKPPL workshop

#### Versal project

- KEK together with Japanese HEP community purchased a few evaluation kits of the Xilinx Versal series FPGA.
  - Plan: Common and general studies on the new technologies for future electronics device's R&D.
- The features of different Versal series:
  - AI/DSP engine: interface to implement ML core into firmware.
  - High Bandwidth Memory (HBM).
  - Larger number of cells + High transmission bandwidth.





source: Xilinx website

### Members for our proposed project

- IJCLab and KEK has been collaborating for PCIe40 upgrade in Belle II DAQ for a long time.
  - Technical support for new generation of PCI-Express.
- Experts of Belle II and ATLAS, TRG and DAQ in KEK participates in this project.
  - R&D for future electronics device based on Versal FPGA universal for different experiments.

| France         |              |                                               | Japan                |           |                 |
|----------------|--------------|-----------------------------------------------|----------------------|-----------|-----------------|
| Name           | Institute    |                                               | Name                 | Institute |                 |
| Daniel Charlet | IJCLab Orsay | PCIe readout<br>device for<br>Belle II / LHCb | <u>Yun-Tsung Lai</u> | KEK IPNS  | E-sys, Belle II |
| Patrick Robbe  |              |                                               | Manobu<br>Tanaka     |           | E-sys           |
| Tak-Shun Lau   |              |                                               | Makoto<br>Tomoto     |           | ATLAS           |
| Emi Kou        |              |                                               | Satoru<br>Yamada     |           | Belle II        |
|                |              |                                               | Yutaka<br>Ushiroda   |           | Belle II        |
|                |              |                                               | Kunihiro<br>Nagano   |           | ATLAS           |
|                |              |                                               | Taichiro Koga        |           | Belle II        |
|                |              |                                               | Yu Nakazawa          |           | Belle II        |

#### **Evaluable kits for Versal**



- Features the VC1902 Versal AI Core series
- For using AI and DSP engines with greater compute performance that current server class CPUs.



- Features the Versal AI Core Series
- For (AI) Engine development with Vitis and AI Inference development



- Features the VM1802 Versal<sup>™</sup> Prime series
- The world's first adaptive compute acceleration platform (ACAP)
- A software programmable infrastructure and connectivity



KEK has purchased it for collaboration study purpose.

- Features Versal<sup>™</sup> Premium series VP1202
- Multiple high-speed connectivity option
- Massive serial bandwidth, security, and compute density.

# New technology in Versal FPGA: PAM-4

- Present HEP devices mainly based on Non-Return-to-Zero (NRZ): Limit at 25~30 Gbps.
  - Belle II UT4 (UltraScale GTY) can operate with 25 Gbps stably using 64B/66B.
  - ATLAS muon trigger board reaches ~16 Gbps.
- Pulse Amplitude Modulation (PAM-4):
  - Four distinct voltage levels.
  - Xilinx UltraScale+: up to 58 Gbps. Versal premium (GTM): up to 112 Gbps.
  - PCIe Gen6: based on PAM-4.
  - Pioneer to study it in HEP community.
- Work plan: With VPK120 GTM
  - Higher error rate: firmware treatment in clock, check-sum, etc.
  - Gray code and different encoding scheme.
  - QSFP-DD lookback  $\rightarrow$  4\*100 Gbps QSFP.
    - Optical oscilloscope could be helpful to further understand the property.
  - Develop general-purposed protocol for TRG.
  - DAQ: maybe still early, but studying the irradiation effect on it is also important.













4 levels

Four distinct voltage levels. Two bits per clock cycle.

> source: Xilinx

#### 2023/05/11

#### New technology in Versal FPGA: PCIe Gen5

- ALICE, LHCb and Belle II has been using the PCIe40(CRU): based on PCIe Gen3.
- New generation of PCI-Express comes out with a doubled bandwidth in few years.
  - Using proper device to study the properties of newer generation of PCI-Express is beneficial for the future readout device's development.
- Work plan: With VPK120, where PCIe Gen5 interface implemented with its GTY transceiver.
  - Host PC is also prepared, and the test bench is under preparation.
  - Start from simple register R/W function.
  - Make protocol (using DMA or so) for continuous readout, and study the performance of throughput.



PCIe40: PCIe Gen3

- An internship with Engineer background will stay in KEK for ~4 months to work for this.
  - Also for other preparations of technical works.



### PCIe40 → PCIe400: ongoing work in IJCLab

- Reference: A ongoing project at France for future readout device: PCIe400
- Aim for 4 times larger bandwidth in 2026
  - More than 40 bi-directional links up to 26 Gbps (NRZ)
  - Use server memory or HBM2e
  - PCle Gen5
  - Also reserved ports (QSFP112) for testing purpose on PAM-4.
- Plan for far future:
  - A 800 GbE interface for 2032



• We expect the communication between KEK and IJCLab is helpful for the PCIe technical study.

# L1 TRG algorithm: Tracking in Belle II

- L1 Trigger: Implementation of complicated physics algorithm in FPGA
  - How Belle II TRG does tracking in FPGA:



2023/05/11

#### L1 TRG algorithm: More new ideas

- Belle II Trigger board: Xilinx UltraScale XCVU080, XCVU160.
- ATLAS Muon Trigger processor: Xilinx UltraScale+ XCVU13P.





KEK: T. Koga, R. Nomaru.



Also lots of other new ideas: GNN-based clustering, tracking .....

# Consideration for the new trigger device

- To improve the performance of our HDL algorithms:
- Go with the same way (finding/fitting):
  - Better precision or granularity
  - More info / less constraint e.g. ADC waveform.
  - Extra dimension. e.g.  $2D \rightarrow 3D$  Hough.

- Use machine-learning:
  - NN for parameter calculation.
  - GNN on the global map.
  - NN for topological output.

- Similar level of latency.
- More FPGA resource and higher data transmission rate are desired.
- Utilize BRAM as neuron's LUT: hls4ml.
  - FPGA resource
  - Or HBM series
- Implement ML core into **AI engine** or **DSP engine** in new FPGA series.
- Al core, Al edge, Prime series.

Premium series with larger bandwidth (PAM-4). VPK120 VPK120 VCK190 VMK180

### Trigger device for Belle II and ATLAS

• After comprehensive studies one the Versal FPGA series, we expect to develop a general and universal Trigger device for different experiments eventually.

Belle II UT4



#### Belle II UT3



#### **ATLAS Muon Trigger processor**



Xilinx Virtex-6 xc6vhx380t, xc6vhx565t 11.2 Gbps with 64B/66B

Xilinx UltraScale XCVU080, XCVU160 25 Gbps with 64B/66B

Xilinx UltraScale+ XCVU13P XCZU5EV GTH,GTY: 16.8 Gbps with 64B/66B

2023/05/11

#### Roadmap of the project

| 1 <sup>st</sup> year:                                   | <ul> <li>VCK5000, VPK120, and the host server have been set up in KEK.</li> <li>Study the properties of the fundamental functionalities: <ul> <li>GTM(PAM-4), PCIe Gen5, AI/DSP engine, CPU acceleration, etc.</li> </ul> </li> <li>Prepare basic application for each of them for other members.</li> </ul> |  |  |  |  |
|---------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
|                                                         | Will purchase more kits (VMK180, VCK190) depending on other budgets                                                                                                                                                                                                                                          |  |  |  |  |
| 2 <sup>nd</sup> year:                                   | <ul> <li>Make general transmission protocols for GTM(PAM-4), PCIe Gen5, and<br/>do performance study.</li> </ul>                                                                                                                                                                                             |  |  |  |  |
|                                                         | • Implement various L1 Trigger algorithm (Belle II and ATLAS) in the kits.                                                                                                                                                                                                                                   |  |  |  |  |
|                                                         | Connect to existing Belle II or ATLAS system to take real-time data, and to check the trigger performance.                                                                                                                                                                                                   |  |  |  |  |
|                                                         |                                                                                                                                                                                                                                                                                                              |  |  |  |  |
| 3 <sup>rd</sup> year:                                   | Future universal device: readout or trigger.                                                                                                                                                                                                                                                                 |  |  |  |  |
|                                                         | Discussion.                                                                                                                                                                                                                                                                                                  |  |  |  |  |
|                                                         | <ul> <li>Schematic/PCB design for the prototype boards.</li> </ul>                                                                                                                                                                                                                                           |  |  |  |  |
|                                                         | Test with experiments people.                                                                                                                                                                                                                                                                                |  |  |  |  |
|                                                         |                                                                                                                                                                                                                                                                                                              |  |  |  |  |
| IJCLab:<br>Hardware tech<br>support, applic<br>at LHCb. |                                                                                                                                                                                                                                                                                                              |  |  |  |  |

2023/05/11

- We started a project using the evaluation kits of the Xilinx Versal series FPGA with different experimental groups.
- For the application in experimental high energy physics, the new technologies associated to the FPGA device will be studied to develop general-purpose firmware/software modules.
- Based on research results, the development of universal device (such as Trigger or readout) for different experiments will be considered using these Versal FPGA or newer generation in the future.
- The collaboration between IJCLab group and KEK group will be helpful for the hardware technical R&D, and the experience exchange between E-sys, Belle II, ATLAS, and LHCb can have great impact for common development of future device or future experiment.

# Backup

2023/05/11

#### Resource of FPGA chips

| FPGA              | # of cells (K) |  |  |
|-------------------|----------------|--|--|
| XC6VHX380T (UT3)  | 382            |  |  |
| XC6VHX565T (UT3)  | 566            |  |  |
| XCVU080 (UT4)     | 780            |  |  |
| XCVU160 (UT4)     | 1621           |  |  |
| Versal AI Edge    | 44-1139        |  |  |
| Versal AI Core    | 540-1968       |  |  |
| Versal Prime      | 329-2233       |  |  |
| Versal Premium    | 833-7352       |  |  |
| Versal HBM        | 3837-5631      |  |  |
| UltraScale+ HBM   | 962-2852       |  |  |
| UltraScale+ 58G   | 2252-3870      |  |  |
| UltraScale+ VU19P | 8938           |  |  |

2023/05/11