Did you know you can download tracks you visualize in the UCSC Genome Browser for personal use and analysis? Here's how to do it.
(1) Click on the grey bar on the far left side of the UCSC data track. This will bring you to the track settings page.
(2) Click on the link called View table schema. This will bring up a new page with track information and a description of data fields.
(3) Look for the field called Primary Table and copy the name.
(4) Go to the UCSC FTP site (link) and find the correct genome build you are after. Usually you will want to select hg19 or hg18.
(5) Click on the database link and then search for the name of the field you copied from the Primary Table field in step 3.
(6) There will usually be a .sql and a .txt.gz for most tracks. You are interested in the .txt.gz file. You can click on it to download via your internet web browser or right click on the link to copy the web address and use the wget command to download it. Here's and example script to download the NHGRI GWAS catalog using the wget command:
(7) Extract the compressed .txt.gz file with the following command, where filename is the name of the file you downloaded
This method should work for downloading the majority of the UCSC data tracks. Sometimes it takes a bit of digging around the UCSC FTP site to find the dataset you are looking for, but in most cases I have been successful in finding it on the UCSC FTP site.
One final note: If you are interested in downloading only a small portion of the track (for example, just a region on chromosome 8), you can download this region using the UCSC Table browser. Here's how to do this:
(1) Follow steps (1) and (2) above.
(2) Once on the Table Schema page for the track of interest go to the link bar on the top of the page and select Tools > Table Browser. This will take you to the UCSC Table Browser where all the fields will already be filled in with the track you are interested in.
(3) To download your region of interest click on the radio button next to position and type in your desired coordinates (ex: chr8:128362121-129551294).
(4) Make sure all the filters are cleared and give your output a filename. Select get output and your file will be downloaded. There is no need to unzip unless you chose the gzip compressed option.
Best wishes and good luck analyzing UCSC data tracks!
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment