Mapping sequence reads & Calling variants

Relevanta dokument
Resultat av den utökade första planeringsövningen inför RRC september 2005

12.6 Heat equation, Wave equation

The present situation on the application of ICT in precision agriculture in Sweden

1. Unpack content of zip-file to temporary folder and double click Setup

Bridging the gap - state-of-the-art testing research, Explanea, and why you should care

Health café. Self help groups. Learning café. Focus on support to people with chronic diseases and their families

Adding active and blended learning to an introductory mechanics course

Preschool Kindergarten

SGUs arbete med havsplanering

Alla Tiders Kalmar län, Create the good society in Kalmar county Contributions from the Heritage Sector and the Time Travel method

Om oss DET PERFEKTA KOMPLEMENTET THE PERFECT COMPLETION 04 EN BINZ ÄR PRECIS SÅ BRA SOM DU FÖRVÄNTAR DIG A BINZ IS JUST AS GOOD AS YOU THINK 05

Sri Lanka Association for Artificial Intelligence

Grafisk teknik IMCDP IMCDP IMCDP. IMCDP(filter) Sasan Gooran (HT 2006) Assumptions:

State Examinations Commission

Rapport avseende DNA-analys av spillningsprover från järv

Measuring child participation in immunization registries: two national surveys, 2001

Karl Holm Ekologi och genetik, EBC, UU. ebc.uu.se. Nick Brandt. Populationsgenetik

2.1 Installation of driver using Internet Installation of driver from disk... 3

Beijer Electronics AB 2000, MA00336A,

RUP är en omfattande process, ett processramverk. RUP bör införas stegvis. RUP måste anpassas. till organisationen till projektet

Vässa kraven och förbättra samarbetet med hjälp av Behaviour Driven Development Anna Fallqvist Eriksson

Local event detection with neural networks (application to Webnet data)

Goals for third cycle studies according to the Higher Education Ordinance of Sweden (Sw. "Högskoleförordningen")

Grafisk teknik IMCDP. Sasan Gooran (HT 2006) Assumptions:


Klicka här för att ändra format

Datasäkerhet och integritet

Alternativet är iwindows registret som ni hittar under regedit och Windows XP 32 bit.

INSTALLATION INSTRUCTIONS

Status på (EU) regelfronten

Swedish adaptation of ISO TC 211 Quality principles. Erik Stenborg

Bilaga 5 till rapport 1 (5)

EASA Standardiseringsrapport 2014

Rastercell. Digital Rastrering. AM & FM Raster. Rastercell. AM & FM Raster. Sasan Gooran (VT 2007) Rastrering. Rastercell. Konventionellt, AM

Läkemedelsverkets Farmakovigilansdag

Molecular marker application in breeding of self- and cross- compatible sweet cherry ( (P. P. avium L.) varieties. Silvija Ruisa, Irita Kota

Vägytans tillstånd, historik och framtid. Johan Lang

Make a speech. How to make the perfect speech. söndag 6 oktober 13

Urban Runoff in Denser Environments. Tom Richman, ASLA, AICP

Kristina Säfsten. Kristina Säfsten JTH

SRS Project. the use of Big Data in the Swedish sick leave process. EUMASS Scientific program

1. Varje bevissteg ska motiveras formellt (informella bevis ger 0 poang)

Normalfördelning. Modeller Vi har alla stött på modeller i olika sammanhang. Ex:

Boiler with heatpump / Värmepumpsberedare

The Municipality of Ystad

Genetik II. Jessica Abbott

Grafisk teknik. Sasan Gooran (HT 2006)

3rd September 2014 Sonali Raut, CA, CISA DGM-Internal Audit, Voltas Ltd.

Styrteknik: Binära tal, talsystem och koder D3:1

Klyvklingor / Ripping Blades.

Configuration Management

Taking Flight! Migrating to SAS 9.2!

SICS Introducing Internet of Things in Product Business. Christer Norström, CEO SICS. In collaboration with Lars Cederblad at Level21 AB

LÖNEN ETT EFFEKTIVT SÄTT FÖR ÖNSKAD PRESTATION - ENDA FÖRUTSÄTTNINGEN FÖR KONKURRENSKRAFT I EN GLOBAL VÄRLD!

District Application for Partnership

Provlektion Just Stuff B Textbook Just Stuff B Workbook

Isometries of the plane

GPS GPS. Classical navigation. A. Einstein. Global Positioning System Started in 1978 Operational in ETI Föreläsning 1

What Is Hyper-Threading and How Does It Improve Performance

Exempelsamling Assemblerprogrammering

ISO STATUS. Prof. dr Vidosav D. MAJSTOROVIĆ 1/14. Mašinski fakultet u Beogradu - PM. Tuesday, December 09,

Botnia-Atlantica Information Meeting

Molecular Biology Primer

Master Thesis. Study on a second-order bandpass Σ -modulator for flexible AD-conversion Hanna Svensson. LiTH - ISY - EX -- 08/ SE

Uttagning för D21E och H21E

Not everything that counts can be counted, and not everything that can be counted counts. William Bruce Cameron

Workplan Food. Spring term 2016 Year 7. Name:

Isolda Purchase - EDI

Förändrade förväntningar

Authentication Context QC Statement. Stefan Santesson, 3xA Security AB


SVENSK STANDARD SS-EN ISO 19108:2005/AC:2015

Product configurations Produire configuration Produkt konfigurationen Producto configuraciones Produkt konfigurationerna

System arbetssystem informationssystem

*****************************************************************

Protected areas in Sweden - a Barents perspective

FIX LED-LYSRÖRSARMATUR MED AKRYLKÅPA IP44

Image quality Technical/physical aspects

Dokumentnamn Order and safety regulations for Hässleholms Kretsloppscenter. Godkänd/ansvarig Gunilla Holmberg. Kretsloppscenter

University of Nottingham ett internationellt campus med många inriktningar

Vanliga frågor om Duocom (för installatör eller reparatör) GB Frequently asked questions about Duocom (for installer or repairman)

Managing addresses in the City of Kokkola Underhåll av adresser i Karleby stad

SVENSK STANDARD SS-EN 13612/AC:2016

Från ebola i smutsiga tält till NästGenerationsSekvensering för $miljoner$ it s bioinformatics! FMS vårmöte

A QUEST FOR MISSING PULSARS

Resa Att ta sig runt. Att ta sig runt - Platser. Du vet inte var du är. Be om att bli visad en viss plats på en karta. Fråga om en viss servicepunkt

FÖRBERED UNDERLAG FÖR BEDÖMNING SÅ HÄR

PORTSECURITY IN SÖLVESBORG

EXTERNAL ASSESSMENT SAMPLE TASKS SWEDISH BREAKTHROUGH LSPSWEB/0Y09

PRESS FÄLLKONSTRUKTION FOLDING INSTRUCTIONS

Resa Att ta sig runt. Att ta sig runt - Platser. I am lost. Du vet inte var du är

Swedish framework for qualification

Samtidig utvärdering av form- & lägekrav

vetenskap - beslut - osäkerhet

ENTERPRISE WITHOUT BORDERS Stockholmsmässan, 17 maj 2016

Stiftelsen Allmänna Barnhuset KARLSTADS UNIVERSITET

Kvalitetsarbete I Landstinget i Kalmar län. 24 oktober 2007 Eva Arvidsson

Det här med levels.?

Support Manual HoistLocatel Electronic Locks

InstalationGuide. English. MODEL:150NHighGain/30NMiniUSBAdapter

Transkript:

Universitair Medisch Centrum Utrecht Mapping sequence reads & Calling variants Laurent Francioli 2014-10-28 l.francioli@umcutrecht.nl

Next Generation Sequencing

Data processing pipeline

Mapping to reference Human Reference Sequence GTGCCAGGACCAGATCGTGCCAACGGACAGGTGGTAAGGAAGGAG Sequence reads GTGCCAAGGA CAACGGACAG AGGTGGTAAG

Mapping to reference Human Reference Sequence GTGCCAGGACCAGATCGTGCCAACGGACAGGTGGTAAGGAAGGAG GTGCCAAGGA GTGCCAAGGA

Mapping to reference Human Reference Sequence GTGCCA-GGACCAGATCGTGCCAACGGACAGGTGGTAAGGAAGGAG GTGCCA-AGGA GTGCCAAGGA GTGCCAAGGA GTGCCAA-GGA

Mapping to reference Human Reference Sequence GTGCCAGGACCAGATCGTGCCAACGGACAGGTGGTAAGGAAGGA Millions of sequence reads GTGCCAAGGA CAACGGACAG AGGTGGTAAG

Coverage Human Reference Sequence GTGCCAGGACCAGATCGTGCCAACGGACAGGTGGTAAGGAAGGAG GTGCCAGGAC A C CCAGGAT G A C GAT CAGATCG ACCAGATCGT GTGCCAA-GGA CAACGGACAG GGAAGGAG

Real NGS aligned data in IGV

Marking Duplicates

Indel realignment

Indel realignment - example

Indel realignment - example

Base qualities ACTGCCAGGTNTCAGTACA

Base quality recalibration Downstream models use base qualities as input Base qualities have systematic machine-specific biases

Analysis-ready reads Reads are correctly placed in the genome Mapping qualities model the uncertainty of mapping Reads are independent Clonal (duplicate) reads are marked Base qualities are calibrated Base qualities model the uncertainty of the base call

Data processing pipeline

SNPs and indels Single Nucleotide Variants (SNVs) ATATTATTGCCCAGACCAATGTCTTGGAGTTATTCCCCTGT ATATTATTGCCCAGACCAATGGAGTTATTCCCCTGT

NGS data is noisy

Diploid is difficult Homozygous All reads should show the same allele Heterozygous ~50% of the reads should show each allele

SNP Calling and Genotyping Human Reference Sequence GTGCCAGGACCAGATCG GTGCCAG GTGCCAGG G G GTGCCAT T GA GTGCCAGGACC GCCAGGACCAGA GCCAGGACCAGAT G G CCAGGACCAGA CCAGGACCAGAT AGGACCAGATCG GGACCAGATCG G

SNP Calling and Genotyping Human Reference Sequence GTGCCAGGACCAGATCG GTGCCAG GTGCCAGG G G GTGCCAT T GA GTGCCAGGACC GCCAT GACCAGA GCCAGGACCAGAT G G CCAT GACCAGA CCAGGACCAGAT AGGACCAGATCG T GACCAGATCG

Genotyping Formalism (GATK) Bi-Allelic Model A: Reference allele B: Alternate allele Homozygous Heterozygous P(b AA) = 1 e, if b = A b e b / 3, if b A P(b AB) = (1 e )/2+ e b b / 6, if b {A, B} e b / 3, if b {A, B}

Interpreting NGS genotypes

Interpreting NGS genotypes

Interpreting NGS genotypes

Interpreting NGS genotypes

Two approaches for calling SNPs and indels Unified Genotyper Haplotype Caller Calls SNPs and INDELS separately Evaluates each base independently Calls SNPs and INDELs simultaneously Uses all bases connected through local de novo assembly Fast! No need for indel realignment More accurate, especially for indels Supports N+1 genotyping

Data processing pipeline

Variant Annotations Reads Strand bias Quality / Depth Mapping quality ranksum Biology / Population Hardy-Weinberg Equilibrium Haplotype Score

Example: strand bias

Example: strand bias

Two ways of filtering Thresholds Pros Work for small datasets No need for external resources Cons Does not adapt to particular data Manual tuning Human bias Machine-learning Pros Multi-dimensional and complex model can fit the data better Ranking of variants using a single metric Cons Need training data Need enough data

Machine learning filtering

Filtering Evaluation Transition / Transversion (Ti/Tv) Proportion of novel variants Heterozygosity Allele frequency spectrum of variants Genotype concordance against chip (if available)

Ti/Tv-based filtering threshold Sensitivity (proportion of sites in training set) 100 99.5 99 95 0.98 1.00 1.02 1.04 1.06 1.08 Specificity (known TiTv/novel TiTv)

Genome of the Netherlands (GoNL) FR 62 GR 57 ZH 168 NH 91 UT 48 FL 0 GD 57 DR 56 OV 58 ZL 46 NB 68 LI 58

GoNL SNPs

Variation captured by GoNL

Resources GATK homepage http://www.broadinstitute.org/gatk/index.php GATK video workshop http://www.broadinstitute.org/gatk/guide/ events?id=2038#materials GoNL homepage http://www.nlgenome.nl