Model fusion

Multi-task learning, inference

Many AI-based applications rely on multiple deep neural networks (DNNs) to perform predictions on different tasks. For example, home robotics and augmented reality applications use DNNs for image classification, object detection, semantic segmentation, and face detection on the input vision stream. These DNNs have high computational cost and are often executed on resource-constrained devices. Multi-task learning (MTL) addresses this problem by using a multi-task model with a single backbone DNN whose parameters are shared across tasks. This sharing can reduce the computational cost of inference and reduce its latency.

In several applications, however, developers need to combine separate pre-trained task-specific DNNs that have heterogeneous architectures without a shared backbone. Rather than retraining a new MTL model from scratch, our GMorph work proposes a novel technique called model fusion that fuses the pre-trained DNNs into a single multi-task model. The fused model is then fine-tuned using task-specific training datasets. The fundamental idea is to break up complex DNNs into a sequence of computation blocks, as for example convolution layers, and then have different DNNs share intermediate features across blocks and so that they can be computed only once by multiple DNNs. The GMorph framework implements model fusion and introduces several techniques such as search-space sampling and predictive filtering to reduce the high cost of searching the space of possible model fusions. Our evaluation shows that GMorph can outperform MTL baselines and reduce the inference latency of multi-DNNs by 1.11-2.23x while meeting the target task accuracy.

References

2024

  1. GMorph: Accelerating Multi-DNN Inference via Model Fusion
    Qizheng Yang, Tianyi Yang, Mingcan Xiang, Lijun Zhang, Haoliang Wang, Marco Serafini, and Hui Guan
    In Proceedings of the 19th ACM European Conference on Computer Systems (Eurosys), 2024