Creating a consensus sequence from a bam file requires 3 steps:
Step 1: Run samtools mpileup to create the pileup file
Step 2: Run bcftools call to call SNPs and INDELs
Step 3: run vcfutils.pl to create the consensus file
You can run the commands separately or pipe them together in one command
The examples below include the options I use when creating the consensus sequence. You can change them as necessary.
NOTE: The commands listed below might wrap to additional lines, but they should be entered on one line
Step 1: Run samtools mpileup to create the pileup file
Parameters
--min-BQ
Minimum base quality for a base to be considered.
--VCF
Compute genotype likelihoods and output them in the variant call format (VCF).
--uncompressed
Generate uncompressed VCF/BCF output.
--output-tags DP,DV,DPR,DP4,SP
Comma-separated list of tags to output
DP --> Number of high-quality bases
DV --> Number of high-quality non-reference bases
DPR --> Number of high-quality bases for each observed allele
DP4 --> Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases
SP --> Phred-scaled strand bias P-value
--fasta-ref
The faidx-indexed reference file in the FASTA format.
--output
Write pileup or VCF/BCF output to a file, rather than the default of standard output.
Command
samtools mpileup --min-BQ 20 --VCF --uncompressed --output-tags DP,DV,DPR,DP4,SP --fasta-ref REF-FILE BAM-FILE --output VCF-FILE
where:
REF-FILE is the name of the reference file
BAM-FILE is the name of the bam file
VCF-FILE is the name of the output file in VCF format
Step 2: Run bcftools call to call SNPs and INDELs
Parameters
--consensus-caller
use the original samtools/bcftools calling method
--output-type v
output uncompressed VCF
--output
Write output to a file rather than the default of standard output.
Command
bcftools call --consensus-caller --output-type v PILEUP-FILE --output SNP-FILE
where:
PILEUP-FILE is the name of the VCF file created in Step 1
SNP-FILE is the name of the output file containing the SNPs and INDELs in VCF format
Step 3: run vcfutils.pl to create the consensus file
Parameters
vcf2fa
Convert the VCF file to FASTA format
The greater than sign redirects the output to CONSENSUS-FILE instead of writing to STDOUT
Command
vcfutils.pl vcf2fa SNP-FILE > CONSENSUS-FILE
where
SNP-FILE is the VCF file created in Step 2
CONSENSUS-FILE is the name of the ouput consensus file in FASTA format
No comments:
Post a Comment