Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Wednesday, September 10, 2014

How to Index a VCF File

There are two simple ways to create an index for a VCF file of sequence variants. The first is a command line driven approach using Tabix. For directions on installing Tabix, see this post. Here is the code needed for indexing the VCF file (either .vcf or .vcf.gz). First you need to make sure the vcf file is compressed as a vcf.gz file. This is done in the first line of code. Next, create a new .tbi index file in the same directory as your vcf.gz file. Using the -f command will write over an old index file that may be outdated or corrupted. The -p command will tell tabix to use the "vcf" file format.


The second way to index a VCF file is a point and click approach using the BROAD Institute's Integrated Genomics Viewer (IGV) program, a Java based program that will run on a variety of operating systems. To index a VCF file, open IGV, click on the Tools menu and select Run igvtools... A dialogue box will pop up. In the command drop down menu select Index and then click on Browse to select your desired .vcf file. Click run and a new .tbi index file will be created in the same folder.

There are probably other ways to index a VCF file, but these are the ones I am aware of. If you are aware of another method, please share in the comments.

2 comments:

  1. There is a point of confusion in file extensions. tabix generates an index with a .tbi file extension and IGVtools (even with a file.vcf.gz input) produces an index with a .idx file extension. UCSC track hubs do not appear to recognize .idx indices. Can someone explain the difference between .tbi and .idx?

    ReplyDelete