Often times publicly available sequencing data can serve as a useful reference for a sequencing project. The 1000 Genomes project is a great source, especially with their newly released high coverage Complete Genomics data. Here is an example UNIX script that shows how BAM files with genomic regions of interest can be created from a whole-genome BAM file that is hosted on an FTP server, without having to download the entire BAM file first. The bai_file_list.txt file is a file that contains unique identifiers for each BAM file extracted from a previously selected list of BAI files of interest. Here I am just extracting the BRCA1 and BRCA2 regions of the genome. The extracted reads are sorted and then saved as a BAM file and an associated BAI index file is also created. The final step removes excess BAI files that are downloaded and used by Samtools to extract the region of interest from the BAM files on the FTP server.
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment