Highlights from FCCM 2020

We witnessed the first-ever Virtual FCCM this year due to COVID-19. I hope everyone is staying safe and enjoying working from home. Below, I summarise my personal thoughts after attending the FCCM 2020 virtual conference, including paper sessions and workshops. I remark that all the papers and workshops are in good quality. Herein, I highlight some that are related to my personal research interests.

Paper sessions:

Comparison of Arithmetic Number Formats for Inference in Sum-Product Networks on FPGAs, a collaboration between TU Darmstadt and Fulda University of Applied Sciences, wins the best paper award this year. This paper introduces an interesting combination between hardware arithmetic and a non-“standard” machine learning domain. Sum-product networks (SPN) are implemented by custom floating-point arithmetic (FloPoCo), Posit and logarithmic number systems. The authors found that hardware implementations with these three formats demonstrate similar performance. The posit implementation requires more power and hardware resources. I think a mixed-precision SPN implementation would also be interesting to look at. We might need an analytical error model for the operators to determine the right number of bits for each layer. I had some fruitful discussions with Lucas Sommer, the leading author of this paper, and he suggested some prior work on this in the space of probabilistic models.

Low-Cost Approximate Constant Coefficient Hybrid Binary-Unary Multiplier for DSP Applications from University of Minnesota, Twin Cities, is an interesting paper using a relatively new encoding scheme for multiplier design. The authors employed unary representation to constant coefficient multipliers (CCMs), as unary coding suits single-input monotonic functions nicely. The proposed architecture is different to standard LUT-based and shift-and-add based CCMs.

Researchers from Imperial College London, Intel, and Corerain Technologies Ltd, propose a novel latency-hiding RNN hardware architecture Optimizing Reconfigurable Recurrent Neural Networks. Two bottlenecks are addressed in this paper. Firstly, data dependency in RNN computation makes FPGA systems stall. Secondly, an inefficient tiling strategy makes hardware resources idle. Column-wise matrix-vector multiplication is used to eliminate data dependency. A flexible checkerboard tiling strategy is introduced to allow large weight matrices, while supporting element-based parallelism and vector-based parallelism. These optimizations improve the exploitation of the available parallelism to increase run-time hardware utilization and boost inference throughput.

High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression, a collaboration between Tokyo Institute of Technology and Imperial College London, finds a way to solve data transfer bandwidth problems from hosts to FPGAs. The authors compress a transfer image using customised JPEG coding and implement a customised image decoder architecture. The trade-off between data transfer speed-up and recognition accuracy drop is analysed. This paper is one of the best paper candidates.

HBM on an FPGA provides up to 425 GB/s memory bandwidth, which can be used to significantly accelerate bandwidth-critical applications. There is no existing research work/white paper/user guide that accurately exposes the performance characteristics of HBMs. Neither is there no similar work on CPUs/GPUs. Researchers from Zhejiang University, China and ETH present Shuhai: Benchmarking High Bandwidth Memory on FPGAs. With Shuhai, before implementing the concrete application that contains a particular memory access pattern on the FPGA, users are able to benchmark the corresponding memory access pattern to make sure that the memory side will not be the bottleneck.

I am also quite interested in papers using 100Gb/s links on modern FPGA boards. Researchers from University of California, San Diego presented Corundum: An Open-Source 100-Gbps NIC. Corundum is intended to support high-precision transmit scheduling, something that most NICs – even most smart NICs – can only do in a very limited way. More complex functionality such as match-action rules, if necessary, can be implemented outside the main corundum modules. In this case, corundum could be used as a high-performance host interface for a smart NIC design. The authors kindly open source Corundum on GitHub, and welcome discussions at their Google Groups, if you find it an interesting project.

Workshops:

Christophe Bobda and Peter Hofstee organised a workshop on The Future of FPGA-Acceleration in Cloud and Datacenters. Academic presenters from NSF, Yale University, University of Toronto, Boston University, Northeastern University, University of Massachusetts Amherst and University of Florida, provide the state-of-the-art FPGA-cloud related research grants, achievements, applications and potential collaborations. In addition, industrial partners from IBM Research Europe, Algo-Logic, Xilinx and Microsoft, also present their FPGA cloud products. I have a similar feeling to that stated in the workshop abstract: “current FPGA cloud developments are taking place in closed door and company and institution disclose very little on the challenges they encounter as well as the approach currently used to tackle those challenges.” I think this workshop is a good start to bring experts in various fields, such as cloud, FPGA, computer architecture and applications, to share the current state of FPGA-acceleration in cloud computers, and discuss the future and challenges for FPGA-based datacenters.

Xilinx has recently provided the open-sourced free-downloadable Vitis unified software platform, datacenter centric development boards (Alveo) suitable for application acceleration, and another open-source project, PYNQ, using Vitis and Python. In this 2-day hands-on workshop Compute Acceleration Workflow using Vitis and PYNQ, over 100 participants followed the materials prepared by Xilinx university program. I really enjoyed it as this is my first time ever experiencing AWS, Vitis and PYNQ for their compute acceleration features.

Now, it would be a good idea to have a coffee break…

Published by DrHelicopter

He Li is now with University of Cambridge as a Research Associate for Quantum Communication and Computing. He serves on the editorial board of Frontiers in Electronics, and the technical programme committees of the top-tier EDA and reconfigurable computing conferences (DAC, ICCAD, FCCM, FPL, FPT, ASP-DAC and SOCC). Dr. Li has served as reviewers in dozens of IEEE/ACM transactions and conferences. Dr.Li is the FPT'17 best paper presentation award recipient. He serves as the publicity co-chair at IEEE FPT'20, and the publicity chair of FPT'21

Leave a comment

Design a site like this with WordPress.com
Get started