VCF Prepare
1. The reference genome for variant-calling must be IWGSC RefSeq V1.0/V1.1 at this time.
2. VCF can work out flow by GATK Best Practice or Samtools variant-calling pipeline with both DNA and RNA data.
3. All VCF need be compressed as VCF.gz by bcftools or GATK.
4. VCF should contain “GT,AD” in FORMAT tags.
5. VCF from GATK pipeline with default perameters already has "GT,AD" in INFO column as the first and second tags.
6. If you call variants with samtools/bcftools, you need specify the output with “FORMAT/AD”
7. Joint-Call Cohort is suggested for callling variants VCF.
Here are two demo for cohort variants-calling below:
GATK
gatk HaplotypeCaller \
--input mutant.bam \
--output mutant.g.vcf.gz \
--reference refer \
--emit-ref-confidence GVCF
# wild gvcf
gatk HaplotypeCaller \
--input wild.bam \
--output wild.g.vcf.gz \
--reference refer \
--emit-ref-confidence GVCF
# merge gvcf
gatk CombineGVCFs \
--output bsa.g.vcf.gz \
--reference $refer \
--variant mutant.g.vcf.gz \
--variant wild.g.vcf.gz
# call vcf
gatk GenotypeGVCFs \
--reference $refer \
--variant bsa.g.vcf.gz \
--output bsa.vcf.gz
Samtools/Bcftools
bcftools mpileup -Ou --annotate "FORMAT/AD" \
-f ref.fa aln.mutant_bulk.bam aln.wild_bulk.bam | \
bcftools call -Ou -mv | \
bcftools filter -Oz -s LowQual -e '%QUAL<20 || DP>100' > bsa.vcf.gz