Document details

Distilled large language model-driven dynamic sparse expert activation mechanism

Author(s): Chen, Qinghui ; Zhang, Zekai ; Zhang, Zaigui ; Zhang, Kai ; Li, Dagang ; Wang, Wenmin ; Zhang, Jinglin ; Liu, Cong

Date: 2025

Persistent ID: http://hdl.handle.net/10362/189704

Origin: Repositório Institucional da UNL

Subject(s): Dynamic routing; Industrial defect; Industrial large models; Large language models; Mixture-of-experts; Software


Description

Chen, Q., Zhang, Z., Zhang, Z., Zhang, K., Li, D., Wang, W., Zhang, J., & Liu, C. (2025). Distilled large language model-driven dynamic sparse expert activation mechanism. Applied Soft Computing, 185, Part B, Article 114037. https://doi.org/10.1016/j.asoc.2025.114037 --- This work was supported in part by National Key Research and Development Program of China under Grant 2022YFB4500602, Key R&D Program of Shandong Province of China under Grant 2023CXGC010112, the joint funds of the National Natural Science Foundation of China under Grant U24A20221, Distinguished Young Scholar of Shandong Province under Grant ZR2023JQ025, Taishan Scholars Program under Grant tsqn202211290, and Major Basic Research Projects of Shandong Province under Grant ZR2022ZD32.

High inter-class similarity, extreme scale variation, and limited computational budgets hinder reliable visual recognition across diverse real-world data. Existing vision-centric and cross-modal approaches often rely on rigid fusion mechanisms and heavy annotation pipelines, leading to sub-optimal generalization. We propose the Distilled Large Language Model (LLM)-Driven Sparse Mixture-of-Experts (DS-MoE) framework, which integrates text-guided dynamic routing and lightweight multi-scale comprehension. The DS-MoE framework dynamically aligns textual semantics with defect-specific visual patterns through a sparse MoE architecture, where task-relevant experts are adaptively activated based on semantic relevance, resolving inter-class ambiguity. A lightweight MobileSAM encoder enables real-time inference while preserving multi-scale defect details. Extensive experiments on PCB, aluminum foil, and mold defect datasets demonstrate that our framework achieves superior performance compared to existing pure vision models. DS-MoE surpasses YOLOv8/YOLOX with gains of +13.9, +1.4, and +2.0 pp mAP@0.5:0.95 on BBMP, aluminum, and PCB, respectively, while also improving precision and recall.

Document Type Journal article
Language English
Contributor(s) NOVA Information Management School (NOVA IMS); Information Management Research Center (MagIC) - NOVA Information Management School; RUN
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents