Today I needed to calculate minor allele frequencies (MAFs) for sequence variants called in a .vcf file. I couldn't find any programs that would do this for me, so I wrote a quick script to do it in Python.
This can be run in python from the command prompt by typing:
where project.vcf is the vcf file you want to calculate MAFs for. It will return a project.txt file that contains the calculated MAF values. This script will only work for SNPs and does not work on insertions and deletions.
Alternatively, if Python scares you there is a bit of a round about way that will do this for you too. First use vcftools to convert your .vcf file into a Plink compatible .ped and .map file.
Then, open Plink and run the --freq option on the newly created .ped file.
**UPDATE**
Today I found an updated way to use Vcftools to directly calculate the MAF values for you. It just takes the simple command --freq. Here is some example code:
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Subscribe to:
Post Comments (Atom)
I would like to ask if this works with the varscan2 out vcf file which has vcf4.1 format. I tried the script but it does not seem to work citing the error at the first variant line which should be for the format. Do you have any other way I can calculate the MAF from my VCF file
ReplyDeletehelps. thank you.
ReplyDeletedoes it calculate the MAF from the DP4 fields of a vcf file? I tried the vcftools --vcf file.vcf --freq.
ReplyDelete