Workflow to add various transcript and site level annotations to a DNA germline VCF. The workflow has the following framework:
- Old annotations are stripped with bcftools
- VCF is annotated with bcftools
- VCF is normalized
- VCF is annotated with VEP
- VCF is annotated with Echtvar
- Slivar Compound Hets is run
- Exomiser is run
Any of the tools can be enabled/disabled using the following disable_(insert_step) parameters. The default workflow runs in the following configuration:
- disable_bcftools_strip = false
- disable_bcftools_norm = false
- disable_vep = false
- disable_gnomad_anno = false
- disable_compound_hets = true
- disable_exomiser = true
While any step can be disabled, it might have adverse affects on downstream tools. For example, if you disable VEP but turn on Slivar compound hets, the program can fail if your input VCF does not have any ANN or CSQ INFO fields. Validation safeguards only exist for each step because we want to allow for scenarios where partially annotated VCFs can enter at any stage. This allows for faster reannotation in the future.
As mentioned above requirements depend on what is enabled. At the minimum, three inputs are required.
vcf: KF VCF to processvcf_index= Index ofvcfoutput_basename: String prefix of file outputs. Technically optional, but strongly recommended. Default isTEST
If you enable BCFTOOLS STRIP, at least one of the following must be provided:
rm_fields_csvannotate_vcfbcftools_strip_extra_args
If you enable BCFTOOLS NORM, all of the following must be provided:
fasta
If you enable VEP, all of the following must be provided:
assemblyfastavep_cachevep_cache_versionvep_species
If you enable ECHTVAR, all of the following must be provided:
echtvar_zips
If you enable SLIVAR, all of the following must be provided:
ped
As mentioned above, Slivar does require some annotations. For more information see our notes.
analysis_file: YAML with analysis options. See https://exomiser.readthedocs.io/en/latest/advanced_analysis.html#analysis. Recommenddefault_exomiser_WGS_analysis.ymldatadir_file: TAR GZ with referenece files. Example contents:data/ data/remm/ data/remm/ReMM.v0.4.hg38.tsv.gz.tbi data/remm/ReMM.v0.4.hg38.tsv.gz data/2406_hg38.sha256 data/2406_hg38/ data/2406_hg38/2406_hg38_transcripts_refseq.ser data/2406_hg38/2406_hg38_transcripts_ensembl.ser data/2406_hg38/2406_hg38_genome.mv.db data/2406_hg38/2406_hg38_variants.mv.db data/2406_hg38/2406_hg38_transcripts_ucsc.ser data/2406_hg38/2406_hg38_clinvar.mv.db data/cadd/ data/cadd/1.7/ data/cadd/1.7/whole_genome_SNVs.tsv.gz data/cadd/1.7/whole_genome_SNVs.tsv.gz.tbi data/cadd/1.7/gnomad.genomes.r4.0.indel.tsv.gz data/cadd/1.7/gnomad.genomes.r4.0.indel.tsv.gz.tbi data/2406_phenotype/ data/2406_phenotype/2406_phenotype.mv.db data/2406_phenotype/rw_string_10.mv data/2406_phenotype/phenix/ data/2406_phenotype/phenix/out/ data/2406_phenotype/phenix/out/10.out data/2406_phenotype/phenix/out/9_symmetric.out data/2406_phenotype/phenix/out/14.out data/2406_phenotype/phenix/out/13_symmetric.out data/2406_phenotype/phenix/out/20.out data/2406_phenotype/phenix/out/19.out data/2406_phenotype/phenix/out/16_symmetric.out data/2406_phenotype/phenix/out/7.out data/2406_phenotype/phenix/out/3_symmetric.out data/2406_phenotype/phenix/out/8.out data/2406_phenotype/phenix/out/17_symmetric.out data/2406_phenotype/phenix/out/15_symmetric.out data/2406_phenotype/phenix/out/5_symmetric.out data/2406_phenotype/phenix/out/11.out data/2406_phenotype/phenix/out/14_symmetric.out data/2406_phenotype/phenix/out/8_symmetric.out data/2406_phenotype/phenix/out/10_symmetric.out data/2406_phenotype/phenix/out/1.out data/2406_phenotype/phenix/out/20_symmetric.out data/2406_phenotype/phenix/out/11_symmetric.out data/2406_phenotype/phenix/out/18_symmetric.out data/2406_phenotype/phenix/out/4.out data/2406_phenotype/phenix/out/19_symmetric.out data/2406_phenotype/phenix/out/2.out data/2406_phenotype/phenix/out/12.out data/2406_phenotype/phenix/out/15.out data/2406_phenotype/phenix/out/2_symmetric.out data/2406_phenotype/phenix/out/9.out data/2406_phenotype/phenix/out/17.out data/2406_phenotype/phenix/out/16.out data/2406_phenotype/phenix/out/6_symmetric.out data/2406_phenotype/phenix/out/13.out data/2406_phenotype/phenix/out/18.out data/2406_phenotype/phenix/out/3.out data/2406_phenotype/phenix/out/4_symmetric.out data/2406_phenotype/phenix/out/1_symmetric.out data/2406_phenotype/phenix/out/7_symmetric.out data/2406_phenotype/phenix/out/6.out data/2406_phenotype/phenix/out/5.out data/2406_phenotype/phenix/out/12_symmetric.out data/2406_phenotype/phenix/hp.obo data/2406_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt data/2406_phenotype/hp.obopheno_file: YAML file with sample information- Individual example:
--- id: FM_ZCF17CPC proband: subject: id: BS_4JPDCXVR sex: MALE phenotypicFeatures: - type: id: HP:0009733 label: Glioma pedigree: persons: - individualId: BS_4JPDCXVR sex: MALE affectedStatus: AFFECTED metaData: resources: - id: hp name: human phenotype ontology url: http://purl.obolibrary.org/obo/hp.owl version: hp/releases/2019-11-08 namespacePrefix: HP iriPrefix: 'http://purl.obolibrary.org/obo/HP_' phenopacketSchemaVersion: 2.0
- Family example:
--- id: FM_ASMHHE2N proband: subject: id: BS_KPAR4N94 sex: MALE phenotypicFeatures: - type: id: HP:0009733 pedigree: persons: - individualId: BS_KPAR4N94 sex: MALE paternalId: BS_264QQD38 maternalId: BS_EQ98CJJT affectedStatus: AFFECTED - individualId: BS_264QQD38 sex: MALE affectedStatus: UNAFFECTED - individualId: BS_EQ98CJJT sex: FEMALE affectedStatus: UNAFFECTED metaData: resources: - id: hp name: human phenotype ontology url: http://purl.obolibrary.org/obo/hp.owl version: hp/releases/2019-11-08 namespacePrefix: HP iriPrefix: 'http://purl.obolibrary.org/obo/HP_' phenopacketSchemaVersion: 2.0
- Individual example:
All of the defaults can be found in params block of nextflow.config.
Kids First specific run configurations can be found in conf/kids_first.config.
Here's a snapshot of the run configuration for Kids First Germline data:
bcftools stripuses theannotatemodule to:- remove existing VEP (
CSQ) and gnomad 3.1.1 (gnomad_3_1_1*) annotations - add annotations from
annotate_vcf
- remove existing VEP (
bcftools norm: Usesnormto split (or join) mulit-allelics and normalize indelsVEPuses cache 105 and meets the following specific criteria:--allele_number --allow_non_variant --assembly GRCh38 --cache --cache_version 105 --canonical --ccds --check_existing --dir_cache ./ --domains --dont_skip --failed 1 --format vcf --gene_phenotype --hgvs --hgvsg --mane --mane_select --merged --no_escape --no_stats --numbers --offline --pick_order rank,biotype,mane,canonical,appris,tsl,ccds,length,ensembl,refseq --polyphen b --protein --pubmed --regulatory --shift_hgvs 1 --sift b --species homo_sapiens --symbol --total_length --tsl --uniprot --variant_class --vcf --xref_refseqechtvar: Used for fast gnomAD annotation from theechtvar_zipsinput (gnomad.v3.1.1.custom.echtvar.zip)slivar: Used for rare variant discovery, primarily in joint family VCFs using theexprwith theslivar_zipsinput (topmed.hg38.dbsnp.151.zip) andcompount-hetsmodulesexomiser: "The Exomiser is a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data." See https://github.com/exomiser/Exomiser for more details