Thursday, May 2, 2013

Thoudand Genomes Complete Genomics Index

As mentioned previously, 1000 Genomes has now made Complete Genomics whole genome sequencing publicly available for download.  They give an index file that mentions some of the individual high coverage .bam files that are available for download, but it seems to be missing a lot of the newer data released this April.  I was trying to find a way to efficiently search through the 1000 Genomes ftp site to get a better index of the available Complete Genomics data.  I am primarily interested in CEU samples and wanted the high coverage evidence support files.  Here is the code I used to search through the ftp. I chose just to search for and download .bai files from the site since they are quick to download and would create a useful index for downloading the bam files.

