Lattice QCD is one of the major scientific work-loads on supercomputer installations. Most of the computer time is spent in an iterative solver of a large, sparse set of linear equations. One of the simplest examples of such an iterative solver is the conjugate gradient algorithm. In this talk, we present an optimized implementation of this algorithm in the context of Lattice QCD for Xilinx Alveo U280 accelerator cards. We compare its performance with that obtained on a CPU architecture and highlight the advantages of an FPGA-based implementation.