GATK4.0 study

docker run -v ~/gatk_bundle:/gatk/my_data -it broadinstitute/gatk:4.0.2.0

gatk HaplotypeCaller \
-R ref/ref.fasta \
-I bams/mother.bam \
-O sandbox/motherHC.vcf \
-L 20:10,000,000-10,200,000
gatk HaplotypeCaller \
-R ref/ref.fasta \
-I bams/mother.bam \
-O sandbox/motherHCdebug.vcf \
-bamout sandbox/motherHCdebug.bam \
-L 20:10,002,371-10,002,546 -ip 100

gatk HaplotypeCaller \
-R ref/ref.fasta \
-I bams/mother.bam \
-O sandbox/mother.g.vcf \
-ERC GVCF \
-L 20:10,000,000-10,200,000

gatk GenomicsDBImport \
-V gvcfs/mother.g.vcf \
-V gvcfs/father.g.vcf \
-V gvcfs/son.g.vcf \
–genomicsdb-workspace-path sandbox/trio \
–intervals 20:10,000,000-10,200,000

gatk GenotypeGVCFs \
-R ref/ref.fasta \
-V gendb://sandbox/trio \
-O sandbox/trioGGVCF.vcf \
-L 20:10,000,000-10,200,000

#_________________
gatk SelectVariants \
-R ref/ref.fasta \
-V input_vcfs/trio.vcf.gz \
-sn NA12878 \
-select-type SNP \
–exclude-non-variants \
-O sandbox/motherSNP.vcf.gz

somatic CNV

Step Latest GATK tool Old tool Description
1 PreprocessInterals PadTargets Pad or bin intervals for coverage collection
2 CollectFragmentCounts CalculateTargetCoverage Collect fragment counts at specified intervals
3 CreateReadCountPanelof
Normals CreatePanelofNormals Create the PoN from fragment
counts
4 DenoiseReadCounts NormalizeSomaticReadCounts Denoise case sample counts against the PoN
5 ModelSegments PerformSegmentation, AllelicCNV Group and model contiguous copy-ratios and allele fractions
6 CallCopyRatioSegments CallSegments Call copy neutral (0) loss (-), and gain (+) segments
7 PlotDenoisedCopyRatios
& PlotModeled
Segements PlotSegmentedCopyRatio, PlotACNVResults Plot copy ratios and allele fractions to visualize denoising
and segmentation

gatk PreprocessIntervals \
-L intervals/targets_C.interval_list \
–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–reference ref/Homo_sapiens_assembly38.fasta \
–padding 250 \
–bin-length 0 \
–interval-merging-rule OVERLAPPING_ONLY \
–output sandbox/targets_C.preprocessed.interval_list

 

Example Tumor
gatk –java-options “-Xmx6g” CollectFragmentCounts \
-I bams/tumor.bam \
-L sandbox/targets_C.preprocessed.interval_list \
–reference ref/Homo_sapiens_assembly38.fasta \
–format TSV \
–interval-merging-rule OVERLAPPING_ONLY \
–output sandbox/tumor_clean.counts.tsv

 

Example Normal
gatk –java-options “-Xmx6g” CollectFragmentCounts \
-I bams/normal.bam \
-L sandbox/targets_C.preprocessed.interval_list \
–reference ref/Homo_sapiens_assembly38.fasta \
–format TSV \
–interval-merging-rule OVERLAPPING_ONLY \
–output sandbox/normal_clean.counts.tsv

gatk –java-options “-Xmx6500m” CreateReadCountPanelOfNormals \
–input file1_clean.counts.tsv \

–input file40_clean.counts.tsv \
–minimum-interval-median-percentile 5.0 \
–output cnvponM.pon.hdf5

gatk –java-options “-Xmx7g” DenoiseReadCounts \
-I cnv_inputs/hcc1143_T_clean.counts.hdf5 \
–count-panel-of-normals cnv_inputs/cnvponC.pon.hdf5 \
–standardized-copy-ratios sandbox/hcc1143_T_clean.standardizedCR.tsv \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv

 

gatk –java-options “-Xmx7500m” ModelSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–output sandbox \
–output-prefix hcc1143_T_clean

 

gatk –java-options “-Xmx6000m” CallCopyRatioSegments \
-I sandbox/hcc1143_T_clean.cr.seg \
-O sandbox/hcc1143_T_clean.called.seg

gatk –java-options “-Xmx6000m” PlotDenoisedCopyRatios \
–standardized-copy-ratios sandbox/hcc1143_T_clean.standardizedCR.tsv \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–minimum-contig-length 46709983 \
–output cnv_plots \
–output-prefix hcc1143_T_clean

 

gatk –java-options “-Xmx6000m” PlotModeledSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–segments sandbox/hcc1143_T_clean.modelFinal.seg \

–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–minimum-contig-length 46709983 \
–output cnv_plots \
–output-prefix hcc1143_T_clean

1. Run ​ CollectAllelicCounts​ to collect reference and alternate allele counts for the tumor
and normal.
2. Provide the outputs from step 1 as inputs to ​ ModelSegments​ , along with the denoised
copy ratios from the tumor.
3. Make plots with ​ PlotModeledSegments​ . We skip plotting for the
PlotDenoisedCopyRatios because it will have the same inputs & outputs as in section 5.

gatk –java-options “-Xmx7500m” CollectAllelicCounts \
-L cnv_inputs/theta_biallelicsnps_agilentintervals.interval_list \
-I bams/normal.bam \
–reference ref/Homo_sapiens_assembly38.fasta \
–output sandbox/hcc1143_N_clean.allelicCounts.tsv

gatk –java-options “-Xmx7500m” ModelSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–allelic-counts cnv_inputs/hcc1143_T_clean.allelicCounts.tsv \
–normal-allelic-counts cnv_inputs/hcc1143_N_clean.allelicCounts.tsv \
–output sandbox \
–output-prefix hcc1143_TN_clean

gatk –java-options “-Xmx6000m” PlotModeledSegments \
–denoised-copy-ratios sandbox/hcc1143_T_clean.denoisedCR.tsv \
–allelic-counts sandbox/hcc1143_TN_clean.hets.tsv \
–segments sandbox/hcc1143_TN_clean.modelFinal.seg \
–sequence-dictionary ref/Homo_sapiens_assembly38.dict \
–minimum-contig-length 46709983 \
–output cnv_plots \
–output-prefix hcc1143_TN_clean