Autor(es):
Silva, Catarina ; Ferrão, José ; Marques, Barbara ; Pedro, Sónia ; Correia, Hildeberto ; Rodrigues, António Sebastião ; Vieira, Luís
Data: 2023
Identificador Persistente: http://hdl.handle.net/10400.18/9152
Origem: Repositório Científico do Instituto Nacional de Saúde
Assunto(s): Long-Read Sequencing; Whole Human Genome Sequencing; Structural Variation; Bioinformatics; Nanopore Sequencing; Tecnologias de Análise de DNA
Descrição
Introduction: Copy number variations (CNVs) represent ~13% of the human genome and can harbour important genes and regulatory elements. High-resolution whole genome microarray (MA) analysis is the gold standard tool for detection of CNVs associated with genetic disorders. While short-read sequencing (SRS) can address SV detection, the use of long-read sequencing as proven to overcome SRS mapping inaccuracy in highly repetitive DNA regions and improve genome contiguity. We applied whole genome nanopore sequencing (NS) to call CNVs and compared the results with those obtained by microarray. Methodology: Genomic DNA from 2 cell lines (EOL-1 and 697) were processed using the CytoSan HD Array (Affymetrix) and ChAS software (ThermoFisher). A minimum CNV calling size threshold of 35 Kb was used. DNA was also sequenced on the MinION device (Oxford Nanopore Technologies) following a rapid library preparation method. Sequencing data were basecalled using Guppy, mapped with LRA, and SVs called using both CuteSV and Sniffles2. Sanger sequencing was performed to demonstrate breakpoint positions for 3 CNVs. R packages were used to perform comparisons between MA and NS data. Results: A total of 49 CNVs were confirmed after curated MA analysis in both cell lines, ranging in size from 35 Kb to 79 Mb. From those, 43 CNVs (87.7%) were called in nanopore data by either one (4 CNVs) or both (39 CNVs) callers with a mean whole genome coverage of ~12X. Six of 43 CNVs were called as inversions instead. In 3 CNVs the size of the variant was found to be smaller (ranging from ~5 to 22 Kb) than the threshold of MA analysis. The correlation between CNV sizes obtained with MA and NS was of 0.71 with Sniffles2 and 0.74 with CuteSV, whereas the correlation between callers was of 0.99. The breakpoint precision obtained for NS was much higher (ranging for CuteSV from 2 to 42 bp; and for Sniffles2 from 0 to 87 bp) than the one obtained for MA (ranging from 774 to 7618 bp). Conclusions: NS technology proved to be technically effective in the detection of CNVs of different types and sizes and thus posing itself as an alternative to MA in the detection of pathogenic SVs associated with genetic diseases. However, NS data analysis requires fine-tuning of the analysis conditions as well as the use of different methods, for greater reliability of results in a clinical context.