A lot of sequencing programs and analyses require a reference genome FASTA file to run. Since I didn't have a reference genome in my possession, I looked into making one of my own. Here is the code I used to create an indexed reference sequence file from the UCSC ftp site that would be compatible with GATK. When concatenating the chromosomes together, make sure they are in the same order and the same length as the .bam file you want to use it with.
After running GATK for the first time, a hg.19.dict file is also created.
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Subscribe to:
Post Comments (Atom)
FYI, you need to unzip the chromosomes...
ReplyDelete