Pages

Tuesday, July 9, 2013

Calculate Minor Allele Frequencies from VCF File Variants

Today I needed to calculate minor allele frequencies (MAFs) for sequence variants called in a .vcf file.  I couldn't find any programs that would do this for me, so I wrote a quick script to do it in Python.


This can be run in python from the command prompt by typing:


where project.vcf is the vcf file you want to calculate MAFs for.  It will return a project.txt file that contains the calculated MAF values.  This script will only work for SNPs and does not work on insertions and deletions.

Alternatively, if Python scares you there is a bit of a round about way that will do this for you too.  First use vcftools to convert your .vcf file into a Plink compatible .ped and .map file.


Then, open Plink and run the --freq option on the newly created .ped file.

**UPDATE**
Today I found an updated way to use Vcftools to directly calculate the MAF values for you.  It just takes the simple command --freq.  Here is some example code:

3 comments:

  1. I would like to ask if this works with the varscan2 out vcf file which has vcf4.1 format. I tried the script but it does not seem to work citing the error at the first variant line which should be for the format. Do you have any other way I can calculate the MAF from my VCF file

    ReplyDelete
  2. does it calculate the MAF from the DP4 fields of a vcf file? I tried the vcftools --vcf file.vcf --freq.

    ReplyDelete