A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label site. Show all posts
Showing posts with label site. Show all posts
Sunday, June 1, 2014
Visualize CTCF Binding Sites in UCSC Genome Browser
The Encyclopedia of DNA Elements (ENCODE) Consortium has recently added some new and exciting information to the University of California Santa Cruz (UCSC) genome browser relating to CTCF binding sites. CTCF is a transcriptional repressor encoded by the CTCF gene that regulates the 3 dimensional (3D) structure of chromatin. CTCF mediated changes in chromatin structure can lead to insulator activity, transcriptional regulation, and changes in RNA splicing. CTCF binding sites typically have a CCCTC motif used by CTCF to bind together strands of DNA. Changes in gene expression can result from the bound CTCF sites insulating the activity of an enhancer from a gene's promoter site.
The UCSC genome browser has two tracks that aid in identifying CTCF binding sites in the genome. The two tracks are the Transcription Factor ChIP-seq track and the Genome Segmentations from ENCODE track. Both UCSC data tracks use ChIP-seq data and can be filtered to localize CTCF binding sites. The Transcription Factor ChIP-seq track shows CTCF binding sites from a large collection of ChIP-seq experiments. The Genome Segmentation track takes things a step further by segmenting the genome into a variety of functionally relevant segments (ex: insulators, enhancers, promoters) using two bioinformatic algorithms (ChromHMM and Segway).
A helpful link to visualize CTCF insulator sites in the UCSC genome browser is here. This will link to a session where the two above described ENCODE tracks are selected in the browser window. It should look something like the picture above the post. If you are aware of other resources or other UCSC data tracks that localize genomic insulators such as CTCF binding sites, please share in the comments below.
Wednesday, December 18, 2013
Copy and Transfer Files To or From SFTP Site
Transferring files to and from a SFTP site is relatively simple once you have generated a public/private key pair and have your account set up.
To login to the SFTP server, type in sftp username@server at the UNIX command prompt. Once logged in, you can change directories on the SFTP site similar to how you would change directories in UNIX: with the cd command. You can also change directories on you local account using the lcd command. To transfer files from one server to another, you need to first be in the correct local and SFTP directories from which you want the files transferred to/from.
To copy a file from the SFTP server to your local host, use the get command. For example, if you wanted to get the file ids.txt, you would type get ids.txt at the command prompt. Conversely, to transfer file to the SFTP site from your local host, use the put command. The SFTP also uses wildcards (*), so if for example, you wanted to transfer all .jpg files to the SFTP server, you would type in put *.jpeg at the command prompt. Below are a few other useful commands.
Also, remember it is always a good idea to check the checksums of files you have downloaded with those on the SFTP site to ensure you downloaded the files in their entirety.
To login to the SFTP server, type in sftp username@server at the UNIX command prompt. Once logged in, you can change directories on the SFTP site similar to how you would change directories in UNIX: with the cd command. You can also change directories on you local account using the lcd command. To transfer files from one server to another, you need to first be in the correct local and SFTP directories from which you want the files transferred to/from.
To copy a file from the SFTP server to your local host, use the get command. For example, if you wanted to get the file ids.txt, you would type get ids.txt at the command prompt. Conversely, to transfer file to the SFTP site from your local host, use the put command. The SFTP also uses wildcards (*), so if for example, you wanted to transfer all .jpg files to the SFTP server, you would type in put *.jpeg at the command prompt. Below are a few other useful commands.
Sftp Command | Description |
---|---|
cd dir | Change directory on the ftp server to dir. |
lcd dir | Change directory on your machine to dir. |
ls | List files in the current directory on the ftp server. |
lls | List files in the current directory on your machine. |
pwd | Print the current directory on the ftp server. |
lpwd | Print the current directory on your machine. |
get file | Download the file from the ftp server to current directory. |
put file | Upload the file from your machine to the ftp server. |
exit | Exit from the sftp program. |
Also, remember it is always a good idea to check the checksums of files you have downloaded with those on the SFTP site to ensure you downloaded the files in their entirety.
Wednesday, August 14, 2013
Download Data Track from UCSC Genome Browser
Did you know you can download tracks you visualize in the UCSC Genome Browser for personal use and analysis? Here's how to do it.
(1) Click on the grey bar on the far left side of the UCSC data track. This will bring you to the track settings page.
(2) Click on the link called View table schema. This will bring up a new page with track information and a description of data fields.
(3) Look for the field called Primary Table and copy the name.
(4) Go to the UCSC FTP site (link) and find the correct genome build you are after. Usually you will want to select hg19 or hg18.
(5) Click on the database link and then search for the name of the field you copied from the Primary Table field in step 3.
(6) There will usually be a .sql and a .txt.gz for most tracks. You are interested in the .txt.gz file. You can click on it to download via your internet web browser or right click on the link to copy the web address and use the wget command to download it. Here's and example script to download the NHGRI GWAS catalog using the wget command:
(7) Extract the compressed .txt.gz file with the following command, where filename is the name of the file you downloaded
This method should work for downloading the majority of the UCSC data tracks. Sometimes it takes a bit of digging around the UCSC FTP site to find the dataset you are looking for, but in most cases I have been successful in finding it on the UCSC FTP site.
One final note: If you are interested in downloading only a small portion of the track (for example, just a region on chromosome 8), you can download this region using the UCSC Table browser. Here's how to do this:
(1) Follow steps (1) and (2) above.
(2) Once on the Table Schema page for the track of interest go to the link bar on the top of the page and select Tools > Table Browser. This will take you to the UCSC Table Browser where all the fields will already be filled in with the track you are interested in.
(3) To download your region of interest click on the radio button next to position and type in your desired coordinates (ex: chr8:128362121-129551294).
(4) Make sure all the filters are cleared and give your output a filename. Select get output and your file will be downloaded. There is no need to unzip unless you chose the gzip compressed option.
Best wishes and good luck analyzing UCSC data tracks!
(1) Click on the grey bar on the far left side of the UCSC data track. This will bring you to the track settings page.
(2) Click on the link called View table schema. This will bring up a new page with track information and a description of data fields.
(3) Look for the field called Primary Table and copy the name.
(4) Go to the UCSC FTP site (link) and find the correct genome build you are after. Usually you will want to select hg19 or hg18.
(5) Click on the database link and then search for the name of the field you copied from the Primary Table field in step 3.
(6) There will usually be a .sql and a .txt.gz for most tracks. You are interested in the .txt.gz file. You can click on it to download via your internet web browser or right click on the link to copy the web address and use the wget command to download it. Here's and example script to download the NHGRI GWAS catalog using the wget command:
(7) Extract the compressed .txt.gz file with the following command, where filename is the name of the file you downloaded
This method should work for downloading the majority of the UCSC data tracks. Sometimes it takes a bit of digging around the UCSC FTP site to find the dataset you are looking for, but in most cases I have been successful in finding it on the UCSC FTP site.
One final note: If you are interested in downloading only a small portion of the track (for example, just a region on chromosome 8), you can download this region using the UCSC Table browser. Here's how to do this:
(1) Follow steps (1) and (2) above.
(2) Once on the Table Schema page for the track of interest go to the link bar on the top of the page and select Tools > Table Browser. This will take you to the UCSC Table Browser where all the fields will already be filled in with the track you are interested in.
(3) To download your region of interest click on the radio button next to position and type in your desired coordinates (ex: chr8:128362121-129551294).
(4) Make sure all the filters are cleared and give your output a filename. Select get output and your file will be downloaded. There is no need to unzip unless you chose the gzip compressed option.
Best wishes and good luck analyzing UCSC data tracks!
Subscribe to:
Posts (Atom)