There are two simple ways to create an index for a VCF file of sequence variants. The first is a command line driven approach using Tabix. For directions on installing Tabix, see this post. Here is the code needed for indexing the VCF file (either .vcf or .vcf.gz). First you need to make sure the vcf file is compressed as a vcf.gz file. This is done in the first line of code. Next, create a new .tbi index file in the same directory as your vcf.gz file. Using the -f command will write over an old index file that may be outdated or corrupted. The -p command will tell tabix to use the "vcf" file format.
The second way to index a VCF file is a point and click approach using the BROAD Institute's Integrated Genomics Viewer (IGV) program, a Java based program that will run on a variety of operating systems. To index a VCF file, open IGV, click on the Tools menu and select Run igvtools... A dialogue box will pop up. In the command drop down menu select Index and then click on Browse to select your desired .vcf file. Click run and a new .tbi index file will be created in the same folder.
There are probably other ways to index a VCF file, but these are the ones I am aware of. If you are aware of another method, please share in the comments.
A repository of programs, scripts, and tips essential to
genetic epidemiology, statistical genetics, and bioinformatics
Welcome to the Genome Toolbox! I am glad you navigated to the blog and hope you find the contents useful and insightful for your genomic needs. If you find any of the entries particularly helpful, be sure to click the +1 button on the bottom of the post and share with your colleagues. Your input is encouraged, so if you have comments or are aware of more efficient tools not included in a post, I would love to hear from you. Enjoy your time browsing through the Toolbox.
Showing posts with label Windows. Show all posts
Showing posts with label Windows. Show all posts
Wednesday, September 10, 2014
Friday, July 5, 2013
Getting Set Up on a UNIX Cluster
Okay, you have been granted access to a UNIX cluster to work on a project, but have no idea how to get started with things. No problem. This post is designed to clue you in on the essentials so you can get up and running in no time.
SSH Clinet
This is the first thing you will need. SSH clients are programs that enable your computer to connect remotely to the UNIX cluster. SSH stands for Secure SHell, which is a network protocol for communicating data from one computer to another. Unquestionably, the most widely used SSH Client for Windows operating system users is PuTTY. This is a free program and easy to set up. Download putty.exe and move it to a spot where you can easily access it. Double clicking on it will bring up a security warning. Select run and PuTTY will open. For the Host Name insert the IP address for the cluster (ex: computer.university.edu). In general, this is all you really need to do before pressing the Open button. You can choose to save this as a session for easy access in the future, but just accept all the defaults for now. PuTTY will then connect to the computer at the IP address you specified. If you are a Mac user, this is much easier to do. Just go to the pre-loaded Mac terminal and type "ssh username@computer.university.edu", where computer.university.edu is the IP address for the computer you want to connect to. The first time you connect you will get a warning about a certificate. Just choose the option to proceed. Next, you will usually be asked by the remote computer for a username and password. Input what was given to you by the cluster administrator. You will not see the password as you type it in. After your credentials have been accepted, you will have access to the remote computer. Congratulations, you are now connected! For Windows users, there are also other flavors of PuTTY that can be used. I like PuTTYTray for the added options built into it as well as MTPuTTY which allows for multiple tabs to be opened simultaneously.
File Transfer Application
Next, you will need a way to transfer files and scripts from your computer to the remote computer (cluster) and vice versa. WinSCP is an excellent free program to use from a Windows operating system. Unfortunately, Macs really don't have an equivalent (if you have any suggestions, let me know). Once WinSCP is downloaded and setup, open the program and select New. Fill in the Host name and User name. I usually leave the password blank (in which case I will be asked to manually provide it later), but this is up to your discretion. From there you can log in. After your session has been authenticated, you will be greeted with a window having two major panes. The left side is your computer and the right side is the remote computer. Just drag files from one window to the other to transfer from one computer to another. You now have a way to transfer files to and from the remote computer!
X Server
Some programs require an X server to process output from X11 sessions. This essentially allows you to open an interactive window on your computer in which you can interact with the remote computer. Some programs such as R, Python, and Java have modules that can interact in this fashion. Should you find out you need an X server, I would highly recommend Xming for Windows users. It is free and just needs to be open in the background while you have the terminal open. Mac will have this functionality built in. First though, you need to set this up before logging into the cluster. On Macs, simply add the -X option when connecting through the terminal (ex: ssh -X username@computer.university.edu). For Windows, open PuTTY and in the left menu select Connection > SSH > X11 and click on the box enabling X11 forwarding. Then go back to the top of the menu to Session and log in as before. To make sure this is working, type the command "xeyes". Do you see two eyballs looking at you? If so, it works!
Text Editor
If you like Vi or Emacs, this is not for you. For everyone else, there are a wide variety of text editors that are helpful in writing your code in a variety of programming languages. I prefer Notepad++ for Windows. For Macs, however, I am still searching for a worthy Notepad++ alternative.
Well, I hope this was helpful in getting you up and running on a UNIX computing environment. From personal experience, I know the learning curve can be steep. Best wishes as you learn to navigate your way around on a remote cluster. If you have any helpful suggestions, I highly encourage you to post a comment below.
SSH Clinet
This is the first thing you will need. SSH clients are programs that enable your computer to connect remotely to the UNIX cluster. SSH stands for Secure SHell, which is a network protocol for communicating data from one computer to another. Unquestionably, the most widely used SSH Client for Windows operating system users is PuTTY. This is a free program and easy to set up. Download putty.exe and move it to a spot where you can easily access it. Double clicking on it will bring up a security warning. Select run and PuTTY will open. For the Host Name insert the IP address for the cluster (ex: computer.university.edu). In general, this is all you really need to do before pressing the Open button. You can choose to save this as a session for easy access in the future, but just accept all the defaults for now. PuTTY will then connect to the computer at the IP address you specified. If you are a Mac user, this is much easier to do. Just go to the pre-loaded Mac terminal and type "ssh username@computer.university.edu", where computer.university.edu is the IP address for the computer you want to connect to. The first time you connect you will get a warning about a certificate. Just choose the option to proceed. Next, you will usually be asked by the remote computer for a username and password. Input what was given to you by the cluster administrator. You will not see the password as you type it in. After your credentials have been accepted, you will have access to the remote computer. Congratulations, you are now connected! For Windows users, there are also other flavors of PuTTY that can be used. I like PuTTYTray for the added options built into it as well as MTPuTTY which allows for multiple tabs to be opened simultaneously.
File Transfer Application
Next, you will need a way to transfer files and scripts from your computer to the remote computer (cluster) and vice versa. WinSCP is an excellent free program to use from a Windows operating system. Unfortunately, Macs really don't have an equivalent (if you have any suggestions, let me know). Once WinSCP is downloaded and setup, open the program and select New. Fill in the Host name and User name. I usually leave the password blank (in which case I will be asked to manually provide it later), but this is up to your discretion. From there you can log in. After your session has been authenticated, you will be greeted with a window having two major panes. The left side is your computer and the right side is the remote computer. Just drag files from one window to the other to transfer from one computer to another. You now have a way to transfer files to and from the remote computer!
X Server
Some programs require an X server to process output from X11 sessions. This essentially allows you to open an interactive window on your computer in which you can interact with the remote computer. Some programs such as R, Python, and Java have modules that can interact in this fashion. Should you find out you need an X server, I would highly recommend Xming for Windows users. It is free and just needs to be open in the background while you have the terminal open. Mac will have this functionality built in. First though, you need to set this up before logging into the cluster. On Macs, simply add the -X option when connecting through the terminal (ex: ssh -X username@computer.university.edu). For Windows, open PuTTY and in the left menu select Connection > SSH > X11 and click on the box enabling X11 forwarding. Then go back to the top of the menu to Session and log in as before. To make sure this is working, type the command "xeyes". Do you see two eyballs looking at you? If so, it works!
Text Editor
If you like Vi or Emacs, this is not for you. For everyone else, there are a wide variety of text editors that are helpful in writing your code in a variety of programming languages. I prefer Notepad++ for Windows. For Macs, however, I am still searching for a worthy Notepad++ alternative.
Well, I hope this was helpful in getting you up and running on a UNIX computing environment. From personal experience, I know the learning curve can be steep. Best wishes as you learn to navigate your way around on a remote cluster. If you have any helpful suggestions, I highly encourage you to post a comment below.
Best Notepad++ Alternatives for Mac OS
I ❤ Notepad++. Its a powerful, fully-loaded, and free text editing application that has been an invaluable tool for writing code in a variety of programming languages. The only caveat: it's only available for Windows operating systems. With the acquisition of my shiny, new Macbook Pro, I was incredibly disappointed to find out Notepad++ could not be installed on Macs; so much so I almost returned the Macbook. Since I couldn't find a better laptop to meet my needs (and aesthetic desires), the quest has begun to find a comparable and preferably free text editor that runs on a Mac operating system. I was surprised to find the list of candidates quite long. Here are options I found, unfortunately not all options are free:
Komodo Edit
BBedit ($50)
Coda ($75)
Crossover (Windows emulator, $60) + Notepad++
Espresso ($75)
jEditKomodo Edit
TextEdit (the basic text editor pre-loaded on your Mac)
TextMate (€39 or about $53)
TextWrangler (free lite version of BBedit)
Smultron ($5)
SubEthaEdit (€29, or about $43)
Sublime ($70)
Tincta (free, Pro version for $16)
WINE (Windows emulator) + Notepad++
Apparently the market is saturated with Notepad++ "replacement" text editors for Macs. The predominant text editors most recommended online are highlighted in bold. While looking into the options, it became apparent there really is no one best Notepad++ replacement text editor. It all really depends on what the user is using Notepad++ for and the options they need it to do (plus a bit of personal preference in user interface). I have tried a few of the above options and am still not completely satisfied. I am secretly hoping the folks responsible for Notepad++ are cooking up a way to install it on Macs. The emulator approach to installing Notepad++ on a Mac also seems interesting. I will have to try it when I have some free time. In the meantime, I am curious what has been working best for you other Notepad++ lovers who have made the switch to a Mac. Also, if you are aware of other text editors not mentioned here, please share!
TextWrangler (free lite version of BBedit)
Smultron ($5)
SubEthaEdit (€29, or about $43)
Sublime ($70)
Tincta (free, Pro version for $16)
WINE (Windows emulator) + Notepad++
Apparently the market is saturated with Notepad++ "replacement" text editors for Macs. The predominant text editors most recommended online are highlighted in bold. While looking into the options, it became apparent there really is no one best Notepad++ replacement text editor. It all really depends on what the user is using Notepad++ for and the options they need it to do (plus a bit of personal preference in user interface). I have tried a few of the above options and am still not completely satisfied. I am secretly hoping the folks responsible for Notepad++ are cooking up a way to install it on Macs. The emulator approach to installing Notepad++ on a Mac also seems interesting. I will have to try it when I have some free time. In the meantime, I am curious what has been working best for you other Notepad++ lovers who have made the switch to a Mac. Also, if you are aware of other text editors not mentioned here, please share!
Subscribe to:
Posts (Atom)