Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.

Wednesday, December 18, 2013

Copy and Transfer Files To or From SFTP Site

Transferring files to and from a SFTP site is relatively simple once you have generated a public/private key pair and have your account set up.

To login to the SFTP server, type in sftp username@server at the UNIX command prompt.  Once logged in, you can change directories on the SFTP site similar to how you would change directories in UNIX: with the cd command.  You can also change directories on you local account using the lcd command.  To transfer files from one server to another, you need to first be in the correct local and SFTP directories from which you want the files transferred to/from.

To copy a file from the SFTP server to your local host, use the get command.  For example, if you wanted to get the file ids.txt, you would type get ids.txt at the command prompt.  Conversely, to transfer file to the SFTP site from your local host, use the put command.  The SFTP also uses wildcards (*), so if for example, you wanted to transfer all .jpg files to the SFTP server, you would type in put *.jpeg at the command prompt.  Below are a few other useful commands.

Sftp CommandDescription
cd dirChange directory on the ftp server to dir.
lcd dirChange directory on your machine to dir.
lsList files in the current directory on the ftp server.
llsList files in the current directory on your machine.
pwdPrint the current directory on the ftp server.
lpwdPrint the current directory on your machine.
get fileDownload the file from the ftp server to current directory.
put fileUpload the file from your machine to the ftp server.
exitExit from the sftp program.

Also, remember it is always a good idea to check the checksums of files you have downloaded with those on the SFTP site to ensure you downloaded the files in their entirety.

Generate Public/Private RSA Key Pair for SFTP

To access an SFTP server, an SSH public key must be created and shared with the SFTP host.  To generate a public/private key pair the ssh-keygen utility in UNIX can be used.  This program will calculate a key pair for you by following these steps:

(1) At the UNIX prompt, type ssh-keygen.  A message will appear indicating the application is generating the public/private rsa key pair.

(2) You will next be prompted to enter the location where you want to save the key.  You can specify a filename or just leave the space blank and press Enter.

(3) Afterwards you will be prompted to enter a passphrase to use as a password.  Enter your passphrase here or you can opt to leave the space blank.  It is acceptable to simply hit Enter to proceed without specifying a passphrase.

(4) Finally, you will be prompted to re-enter your passphrase.

That is all there is to creating a public and private key pair.  To access the remote SFTP server the administrator will need you to forward a copy of your public key to set up your account.  The public key will be a file with one line that includes "ssh-rsa", a long key code, followed by your account name and host name (ex: user@host.school.edu).  The file extension will be .pub.

Thursday, December 12, 2013

Generate Coverage Plot in R from Depth Data

Here is a simple way to take coverage data (coverage.depth) and make a plot in R to visualize it.  All you need is a column with base pair coordinates (V1) and a column with respective depth (V2) and you are all set to plug it in the below R code.

Sum Overlapping Base Pairs of Features from Chromosomal BED File

I had a .bed file of genomic features on a chromosome that I wanted to figure out the extent of overlap of the features to investigate commonly covered genes as well as positions where features were likely to form.  I wanted to generate a plot similar to a coverage depth plot from next-generation sequencing reads.  I am sure more efficient methods exist, but here is some Python code that takes in a .bed file of features (features.bed) and creates an output file (features.depth) with the feature overlap "depth" every 5,000 base pairs across the areas which contain features in your chromosomal .bed file.



Create Triangle Plot from Inferred Genetic Ancestry

I previously posted on how to infer ancestry for a group of study participants using SNP genotypes.  Today, I want to visually plot some of the output in R.  Two informative plots that can be generated from the output are a standard plot of the two eigenvectors with percent ancestry overlaid and a triangle (or ternary) plot with each axis representing percentage of one of the three ancestral populations (ex: European, African, and Asian).

Here is some simple code to plot this in R.  There is no base package to plot the triangle plot, so the plotrix package will need to first be installed.  The ancestry.txt file is the output file from SNPWEIGHTS, but other output could be formatted to work as well.


The output should look similar to the plots below.



Tuesday, December 3, 2013

List Contents of a .tar.gz File

Here is a quick one liner to see the contents of a .tar.gz file:

This will list all the files in the .tar.gz tarball.  The v command can be left out to not show file information.

Extract Only Desired Files from Compressed File

Lately I have been working with some large tarballs (tar.gz files).  To access the compressed contents, usually I would extract all the files that were compressed in the archive.  This was time consuming to decompress all the files, a pain to filter through looking for the files I wanted, and required more time to delete everything I didn't want.  I knew there was a better way to just get the files I wanted.  Sure enough, the basic tar command has options that allow you just to extract a file of interest or even a set of files using wildcards.  The command looks like this:

where,
big_file.tar.gz is the tarball you are extracting from
path is the path to your file in the tarball
your_file.txt is the file you want to extract

This can also be done using wildcards.  For example, if I wanted all text files in the above path I could substitute *.txt for your_file.txt.