Self-reported ancestry is poor metric to use when attempting to statistically adjust for the effects of ancestry since several individuals falsely report their ancestry or are simply unaware of their true ancestry. Worse yet, sometimes you don't even have information collected on an individual's ancestry. As many of you know, if you have SNP genotyping data you can rather precisely estimate the ancestry of an individual. Classically, to do this one needed to combine genotypes from their study sample with genotypes from a reference panel (eg:
HapMap or
1000 Genomes), find the intersection of SNPs in each dataset, and then run a clustering program to see which samples clustered with the reference ancestral populations. Not a ton of work, but a minor annoyance at best. Luckily, a relatively new program was just released that, in essence, does a lot of this ground work for you. It is called
SNPWEIGHTS and can be downloaded
here. Essentially, the program takes SNP genotypes as input, finds the intersection of the sample genotypes with reference genotypes, weights them based on pre-configured parameters to construct the first couple of principle components (aka. eigenvectors) and then calculates an individual's percentage ancestry for each ancestral population. Here is how to run the program.
First, make sure you have Python installed on your system and that your genotyping data is in EIGENSTRAT format. A brief tutorial to convert to EIGENSTRAT format using the convertf tool is
here.
Next, download the SNPWEIGHTS package
here and a reference panel. I usually use the European, West African and East Asian
ancestral populations, but there are other options on the
SNPWEIGHTS webpage as well.
Then, create a parameter file with directories of input files, your input population (designated "AA", "CO", and "EA"), and a output file. An example of one is below:
Finally, run the program using the command
inferancestry.py --par par.SNPWEIGHTS. For the program to run correctly, make sure the inferancestry.info and snpwt.co files are in the same directory as your inferancestry.py file.
For more details, see the SNPWEIGHTS
paper or the README file included in the SNPWEIGHTS zip folder. For code on generating eigenvector plots with overlaid ancestry percentages and triangle plots according to percent ancestry, see this
post.