hls4ml is an open-source tool that enables the deployment of machine learning (ML) models on FPGAs using High-Level Synthesis (HLS). It automatically converts pre-trained models from popular deep learning frameworks (e.g., Keras, PyTorch, and ONNX) into optimized firmware for FPGA-based inference.
Traditionally, the entire ML model is synthesized as a monolithic graph, which can lead to long synthesis times and complicated debugging, especially for large model topologies. Splitting the model graph at specified layers into independent subgraphs allows for parallel synthesis and step-wise optimization. However, finding the ‘optimal’ splitting points and optimizing FIFO buffers in between the subgraphs remains a challenge, especially when dealing with dynamic streaming architectures.
This project aims to investigate optimal splitting strategies for complex ML models in hls4ml, focusing on efficient FIFO depth optimization across multi-graph designs. The goal is to develop methodologies that can be integrated into hls4ml to enable automated and optimal graph splitting for improved performance.
The contributor will start by familiarizing themselves with hls4ml and building ML models using multi-graph designs. They will implement profiling techniques (e.g., VCD logging) to measure FIFO occupancy and backpressure in order to develop a FIFO optimization strategy for multi-graph designs. They will also investigate multi-objective optimization algorithms to determine optimal splitting points based on subgraph resource usage or dataflow patterns. Finally, they will integrate these methodologies with hls4ml and run benchmarks to validate improvements in latency, resource utilization, etc.