

# NanoBridge FPGA

#### Toshitsugu Sakamoto \*NanoBridge Semiconductor Inc. (NBS)

## Content

- Introduction & NanoBridge-FPGA
  - Energy efficient computing
  - Operation principle of NanoBridge
  - Schematic diagram of NanoBridge-FPGA
- Nonvolatile SoC-FPGA
  - Provides low power and high functionality for IoT devices
  - Accelerator of convolutional neural network (CNN)
  - 28nm NB-FPGA as accelerator
  - NanoBridge for accelerator and code ROM
- Radiation tolerance of NanoBridge
  - Demonstration of NanoBridge-FPGA on orbit
- Conclusion

## **Enhancement by logic architecture**



## **Switch over logic**



## Switch over logic

• Integrate in BEOL (Backend-of-line) of LSI



## NanoBridge<sup>®</sup> (or Atom switch)

- Resistive change memory (ReRAM)
- Nanometer-scale Cu bridge forms via electrochemical reaction
  - High ON/OFF ratio ~ 2k/200M $\Omega$
  - Weak temperature dependence, Small Capacitance ~ 0.15fF
  - Reprogram cycle > 10<sup>3</sup>
  - No soft error (radiation tolerant)



## **Programming properties**

- Program/Erase evaluation using 128kb-TEG
  - Program/Erase pulse : 1µsec or 200nsec
  - Voltage : 2V to 2.5V



## **Origin of current path**

• Resistive switching is attributed to formation of Cu bridge.



**KEK workshop 2022** 

*Page* : 8

## **Temperature dependence**

• nFET(Red) vs NanoBridge (Blue)



## **FPGA** structure

• FPGA fabric consist of Configurable logic block (CLB)



**Page : 10** 

## **CLB for NB-FPGA**

- 4 Basic logic elements (BLE)\*
- BLE has 4-input LUT and DFF
- NanoBridge crossbar occupies large portion in CLB layout



SB:switch box, CB:Connection block

\*X.Bai et al., VLSI Symo., 2015.

## **EDA tools for NB-FPGA**

- Generated Configuration data from RTL code
  - STA predicts NB-FPGA performance, allowing circuits to be optimized





KEK workshop 2022

### **Comparison with commercial low-power FPGA**

16b-ALU/Signal-generator are mapped using 332 LUT



40nm SRAM-FPGA\*

\*http://www.latticesemi.com/Products/FPGAandCPLD/iCE40.aspx

**Page : 13** 

## **Power comparison**



KEK workshop 2022

## **Summary of comparison**

|                                  | SRAM-FPGA     | NB-FPGA    |
|----------------------------------|---------------|------------|
| <b>Configuration switch</b>      | SRAM/Pass Tr. | NanoBridge |
| Process node                     | 40nm          | 40nm       |
| # of LUTs                        | 12.8k         | 6.4k       |
| Max. speed@1.1V                  | 28MHz         | 56MHz      |
| Active energy<br>per cycle@33MHz | 66pJ          | 33pJ       |

## Content

- Introduction & NanoBridge-FPGA
  - Energy efficient computing
  - Operation principle of NanoBridge
  - Schematic diagram of NanoBridge-FPGA
- Nonvolatile SoC-FPGA
  - Provides low power and high functionality for IoT devices
  - Accelerator of convolutional neural network (CNN)
  - 28nm NB-FPGA as accelerator
  - NanoBridge for accelerator and code ROM
- Radiation tolerance of NanoBridge
  - Demonstration of NanoBridge-FPGA on orbit
- Conclusion

## System-on-Chip (SoC) with ASFPGA

- Provide advanced functionality and computing power in IoT
  - Flexibility : User functions assigned to either CPU or Accelerator, depending on its computation power and functionality.
  - Reduced design complexity by using high-level logic synthesis

#### • Component

- Processor & Code ROM
- RAM for working memory (Flame data, Weight, etc.)
- Input / output serial port (SPI, LVDS, GPIO, etc.)
- BUS Inter-connect
- NanoBridge-FPGA
- PMU

(Power Management Unit)

# CPU ↔ PMU NB-NVM ↔ NB-FPGA SPI/UART ↔ BUS GPIO ↔ DP-SRAM

#### Non-volatile SoC-FPGA

## Deep Learning (DL) in IoT

- DL applications to IoT
  - Image/Video (Classification, object detection, scene understanding, traffic sign, etc.)
  - Natural Language (speech recognition, translation, etc.)
- Convolutional Neural Network (CNN) requires computational power
  - CNN training at cloud, CNN inference at IoT devices
- Challenges for CNN inference in IoT devices
  - Arithmetic operations / memory access
  - Power consumption / heat dissipation
  - → Accelerator with high energy efficiency



## **CNN** accelerator comparison

- FPGA is more energy-efficient for CNN inference.
  - Benchmark : Binarized CNN (VGG16)
- Advantages of FPGA
  - No bottleneck in memory access
  - Highly energy-efficient bitwise operation
  - Massive parallel operations

| Architecture                       | CPU (Baseline)      | GPGPU           | SRAM-FPGA       |
|------------------------------------|---------------------|-----------------|-----------------|
| Device                             | ARM Cortex-A57      | Maxwell GPU     | Zynq7020        |
| Clock Freq.                        | 1.9 GHz             | 998 MHz         | 144 MHz         |
| Memory                             | 16 GB<br>eMMC Flash | 4GB<br>LPDDR4   | 4.9Mb<br>BRAM   |
| Execution time (ms)<br>(fps(/sec)) | 4,210<br>(0.23)     | 27.23<br>(36.7) | 2.37<br>(421.9) |
| Power (W)                          | 7                   | 17              | 2.3             |
| Efficiency (fps/W)                 | 0.032               | 2.2             | 182.6           |

中原啓貴,人工知能 33巻 1号 (2018)

## **Power Reduction on IoT Devices**

Issue 1. Processing by CPU has high latency
 →Processing time shortened by executing on FPGA
 Issue 2. <u>High leakage during standby state</u>
 → Nonvolatile FPGA power-gated by MPU



## NanoBridge in FPGA & NVM

• NanoBridge for memory cell in NVM and routing switch in FPGA



## Low power NVM

- Low-power read operation
  - Large ON/OFF conductance ratio : Sense amp free low operation voltage (down to 0.45V)
  - Read energy below 0.58pJ at 1.05V
  - 11nsec read access at 1.05V





## **CLB** area shrink

- Crossbar area : -75%
  - Depopulated crossbar achieves area saving
- Logic block area : -70%



R.Nebashi et al., VLSI Symposium, 2018

## **FPGA** specification

• Largest number of LUTs among FPGAs using novel nonvolatile memories and switches



| Process      | 28nm CMOS<br>with 9 metal |
|--------------|---------------------------|
| # of LUTs    | 171k                      |
| # of ASs     | 173 Mb                    |
| Block RAM    | 3.2 Mb                    |
| PLL          | 5                         |
| DSP          | 648                       |
| FPU          | 2                         |
| GPIO         | 240                       |
| LVDS         | 16                        |
| Core voltage | 1.05 V                    |
| IO voltage   | 1.8 V                     |

R.Nebashi et al., FPL, 2020

## Content

- Introduction & NanoBridge-FPGA
  - Energy efficient computing
  - Operation principle of NanoBridge
  - Schematic diagram of NanoBridge-FPGA
- Nonvolatile SoC-FPGA
  - Provides low power and high functionality for IoT devices
  - Accelerator of convolutional neural network (CNN)
  - 28nm NB-FPGA as accelerator
  - NanoBridge for accelerator and code ROM
- Radiation tolerance of NanoBridge
  - Demonstration of NanoBridge-FPGA on orbit
- Conclusion

## **Radiation Environments**



\*https://pc.watch.impress.co.jp/docs/news/event/168219.html (D.C. Matthews et al., "NSEU impact on commercial avionics", IRPS 2009.)

*Page* : 26

## **Radiation Induced Effect of Si-LSI**

- Single Event Effect (SEE)
  - Soft error by single high-energy particle (Heavy Ions, Neutron, Proton)
  - Single event upset (SEU)
     Bit flip in SRAM, Flip-flop
  - Single Event Latch-up (SEL)-> Hard error
  - Single Event Transient (SET)
- Total Ionizing Dose Effect (TID)
  - Hard error by accumulation of Gamma-ray
  - Induced fixed charge in MOS
  - CMOS, DRAM
- Displacement Damage Dose (DDD)
  - Hard error by large amount of Neutron
  - Displacement in lattice atom



#### SEE (Single event effect)

Large amount of radiations



TID (Total ionizing dose effect)

## **Radiation harden**

- SRAM/Flash :
  - Date 0/1 is defined by Charge
  - Affected by charge induced in Si substrate
- NanoBridge :
  - ON/OFF states are defined by physical metallic wire
  - not affected by induced charge





## Summary of experimental results

| Effect | Radiation     | Source             | Energy                               | Dose   | DUT        | Active  | Judge                         |
|--------|---------------|--------------------|--------------------------------------|--------|------------|---------|-------------------------------|
| SEU    | Neutron       | Acc./Be            | 2MeV                                 | 3.6e11 | NB-FPGA    | Dynamic | ОК                            |
| SEU    | Neutron       | Acc./Be            | 2MeV                                 | 1.2e12 | NB-FPGA    | Static  | ОК                            |
| SEU    | Heavy lon     | Cf-252             | <b>30</b><br>MeV/cm <sup>2</sup> /mg | 2.7e5  | NB-FPGA    | Dynamic | ОК                            |
| SEU    | Heavy lon     | Acc. /Xe           | 68.9<br>MeV/cm²/mg                   | 2e7    | NB-FPGA    | Dynamic | ОК                            |
| SEU    | α-ray         | Am                 | 5.4MeV                               | 1.1e8  | NB-FPGA    | Dynamic | ОК                            |
| TID    | Gamma-<br>ray | Co60               | 1MeV                                 | 5kGy   | NanoBridge | Static  | ОК                            |
| TID    | Gamma-<br>ray | Co60               | 1MeV                                 | 10kGy  | NB-FPGA    | Static  | Static<br>current<br>increase |
| DDD    | Neutron       | Nuclear<br>reactor | <2MeV                                | 1e14   | NanoBridge | Static  | ОК                            |

K.Ueno et al., IEEE NSS 2021

T.Sakamoto et al., Memrisys 2021 Page : 29

## **Experimental Results in RAPIS-I**

#### • No error detected

- Soft error detection circuit
   Operation time > 3,300 hours
- Image total : 280 pics.





| DUT Type     | Scale    | Duration (hours) | SEU counts |
|--------------|----------|------------------|------------|
| BRAM         | 2kbit    | 859              | 0          |
| AS-Chain1    | 96 CLBs  | 2,254            | 0          |
| AS-Chain2    | 368 CLBs | 230              | 0          |
| DFF/AS-Chain | 468 CLBs | 45               | 0          |

## Conclusion

• Low power of NanoBridge-FPGA demonstrated

- NB-FPGA with 171k LUT for CNN accelerator
  - Depopulated crossbar achieves 75% area saving
  - CNN can be applied to 28nm NB-FPGA
- Radiation tolerance of NanoBridge-FPGA
  - Free from single event upset (SET) & Total ion dose effect (TID)
  - Successful 1 year operation in space

## Acknowledgements

This work is supported by New Energy and Industrial Technology Development Organization (NEDO). Project No. : JPNP16007.

This work is also supported by Tsukuba Innovation Arena (TIA) and National Institute of Advanced Industrial Science and Technology (AIST).

Radiation evaluation is done by K. Takeuchi of JAXA and K. Ueno of KEK.

Project member : R. Nebashi, N. Banno, M. Miyamura, X. Bai, K. Funahashi, K. Okamoto, N. Iguchi, H. Numata, T. Sugibayashi, and M. Tada

