As mentioned previously, 1000 Genomes has now made Complete Genomics whole genome sequencing publicly available for download. They give an index file that mentions some of the individual high coverage .bam files that are available for download, but it seems to be missing a lot of the newer data released this April. I was trying to find a way to efficiently search through the 1000 Genomes ftp site to get a better index of the available Complete Genomics data. I am primarily interested in CEU samples and wanted the high coverage evidence support files. Here is the code I used to search through the ftp. I chose just to search for and download .bai files from the site since they are quick to download and would create a useful index for downloading the bam files.
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment