Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label FASTA. Show all posts
Showing posts with label FASTA. Show all posts

Monday, June 17, 2013

Create 2bit Database for Faster BLAT Searching

Included with the BLAT source code is a useful utility that converts FASTA references to 2bit format.  The utility, faToTwoBit, works simply by providing the original reference .fa file to be converted to a .2bit file.  Here is the syntax to run the program:


A few simple options are also included:
-noMask
-stripVersion
-ignoreDups

For more details on these options, type faToTwoBit at the command line.  Hope this 2bit reference database speeds up your BLAT searches!

Index FASTA File

Some programs require indexed FASTA files (*.fa.fai files).  Indexing a FASTA file is easy to do with Samtools.  Just run the below script on your FASTA file.

Friday, May 24, 2013

Download Nucleotide Sequence for Genomic Region

Sometimes I need the nucleotide sequence for a specific region of the genome to investigate sequence similarity, simple repeats present, or recurring motifs.  I know entire chromosomal .fasta files can be downloaded from the UCSC ftp site, but then I would have to go through the entire file and hopefully extract out the correct sequence I needed.  Today I came across a very easy way to download a nucleotide sequence for a genomic region using the UCSC DAS server.  Simply modify the below web link to include the appropriate genome build and genomic coordinates and you will get a customized XML page generated with the nucleotide sequence for your query.  One word of caution: the DAS server uses an index of +1 for the first base.  Pretty cool and very simple to do.

http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr1:100000,200000

Thursday, May 9, 2013

Create Reference FASTA File

A lot of sequencing programs and analyses require a reference genome FASTA file to run.  Since I didn't have a reference genome in my possession, I looked into making one of my own.  Here is the code I used to create an indexed reference sequence file from the UCSC ftp site that would be compatible with GATK.  When concatenating the chromosomes together, make sure they are in the same order and the same length as the .bam file you want to use it with.



After running GATK for the first time, a hg.19.dict file is also created.