Two GenomeWide SNP 6.0 (Affymetrix) samples. Left with a mosaic event, UPD, on chromosome 13. Right with a mosaic event, deletion, on chromosome 20. Measures seen in the plots corresponds to the LRR (black) and BAF (red).

Inversions and mosaicism detection in large-scale population studies using Affymetrix data


Genome-Wide Association Studies (GWAS) hold great promise in discovering genes underlying complex and heritable disorders for which powerful study designs have failed in the past. The data obtained from those studies have been being mined by analyzing single nucleotide polymorphisms (SNPs). However, SNPs cannot account for much of the heritability of disease. In order to overcome this difficulty, other structural variants (SVs) like inversions, copy-number variants (CNVs) or mosaicisms are being analyzed. These types of SVs can be predicted using genotype data and hence re-analyze a GWAS data in order to assess whether any of these SVs are associated with complex diseases.

inveRsion, GADA and MAD, R packages developed at Barcelona Institute for Global Health (ISGlobal) can be used to do these association studies. Specific data format is required to use these R packages (as well as other existing tools for GWAS and CNV association studies like PLINK or PennCNV). In addition, there are new Affymetrix SNP arrays (GenomeWide SNP, Axiom and CytoScan HD) for which the process of obtaining SNP data from .CEL files is not user-friendly are more importantly are not implemented in R that would facilitate the use of other R packages for downstream analyses.

This master thesis has been designed to address these problems. A new R package for calling SNPs using newest Affymetrix .CEL files (GenomeWide SNP, Axiom and CytoScan HD) has been developed. Pipelines to perform downstream analysis focusing on mosaicisms MAD and inversions inveRsion have also been introduced. The tool have been applied to real data belonging to a general Spanish cohort and HapMap data illustrating how they can be used in the context of SVs association studies.


This project resulted in a publication at BMC Bioinformatics:

affy2sv: an R package to pre-process Affymetrix CytoScan HD and 750K arrays for SNP, CNV, inversion and mosaicism calling Carles Hernandez-Ferrer, Ines Quintela Garcia, Katharina Danielski, Ángel Carracedo, Luis A. Pérez-Jurado and Juan R. González BMC Bioinformatics 2015, 16:167 | doi:10.1186/s12859-015-0608-y | (Open Access)

The report of the project and the slides used in the academic defense follows: