Document details

Precise detection of structural variations in wheat genomes based on deep learning

Author(s): Li, Yanling ; Dong, Zijing ; Li, Yuhong ; Bação, Fernando ; Zhao, Yuyang ; Si, Haiping

Date: 2025

Persistent ID: http://hdl.handle.net/10362/187354

Origin: Repositório Institucional da UNL

Subject(s): Deep learning; wheat genome; structural variation detection; generative adversarial network; ResNet; attention mechanism; Architecture; Building and Construction; Mechanical Engineering; Marketing


Description

Li, Y., Dong, Z., Li, Y., Bação, F., Zhao, Y., & Si, H. (2025). Precise detection of structural variations in wheat genomes based on deep learning. International Journal for Housing Science and Its Applications, 46(4), 5768-5780. https://doi.org/10.70517/ijhsa464415 --- 1. Research and Application of Key Technologies for Agricultural Knowledge Intelligent Services Integrating Largescale Models and Knowledge Graphs; Project Type: Henan Provincial Key R&D Program (251111211300) 2. Research on Maize Leaf Disease Recognition Technology Based on Improved DCGAN Data Augmentation; Project Type: Henan Provincial Science and Technology Research Project (252102520037). 3. Research on Common Key Technologies for Novel Germplasm Resource Creation Based on Artificial Intelligence; Project Type: Henan Provincial Key R&D Program (231111110100).

Wheat, as an important global food crop, its genome structure variation directly affects yield and quality. In this study, a complete framework for detecting structural variations in wheat genome was constructed, which contains four core modules: data preprocessing, image generation, data amplification and deep learning prediction. Firstly, effective structural variation information is extracted from VCF files and generated into BED files, and then gene sequence data are converted into RGB images using gene visualization methods, and different types of structural variation are processed by the designed breakpoint strategy and compression strategy. To address the data imbalance problem, an improved generative adversarial network was proposed for data augmentation, and the F1 value reached 67.46% under the condition that the ratio of positive and negative samples was 1:1. Subsequently, the DLSVPre deep learning prediction model is constructed, using ResNet as the backbone network and incorporating the spatial attention mechanism, with Kaiming initialization and ReLU activation function to optimize the model performance. The experimental results show that the prediction accuracy of DLSVPre on the HG001 dataset is 98.45%, the recall is 97.26%, and the F1-score is 97.85%. The F1-score was improved by 60.58% on the PacBio dataset compared to the traditional GATK method. The study demonstrated that the method provides an effective technical tool for high-precision detection of structural variants in wheat genome.

Document Type Journal article
Language English
Contributor(s) Information Management Research Center (MagIC) - NOVA Information Management School; NOVA Information Management School (NOVA IMS); RUN
facebook logo  linkedin logo  twitter logo 
mendeley logo

Related documents

No related documents