Here is some simple code to plot this in R. There is no base package to plot the triangle plot, so the plotrix package will need to first be installed. The ancestry.txt file is the output file from SNPWEIGHTS, but other output could be formatted to work as well.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# EV Plot with Percent Ancestry Overlay | |
data=read.table("ancestry.txt", as.is=T, header=F) | |
names(data) <- c("ID", "Case", "SNPs", "EV1", "EV2", "EUR", "AFR", "ASN") | |
plot(data$EV1, data$EV2, pch=20, col="gray", xlab="EV1", ylab="EV2") | |
text(data$EV1, data$EV2,labels=round(data$EUR,2)*100, cex=0.4, offset=0.1, pos=3) | |
text(data$EV1, data$EV2,labels=round(data$AFR,2)*100, cex=0.4, offset=0.1, pos=2) | |
text(data$EV1, data$EV2,labels=round(data$ASN,2)*100, cex=0.4, offset=0.1, pos=1) | |
# Triangle Plot | |
data$total=data$EUR+data$AFR+data$ASN # Need to account | |
data$European=data$EUR/data$total # for slight rounding | |
data$African=data$AFR/data$total # in the ancestry | |
data$Asian=data$ASN/data$total # estimation file for | |
data_p=data[c("European","Asian","African")] # triax.plot to work | |
library(plotrix) | |
triax.plot(data_p, pch=20, cc.axes=T, show.grid=T) |
The output should look similar to the plots below.
How to filter or select samples based on ancestry.txt?
ReplyDeleteR can filter the dataset (ancestry.txt) based on any of the variables. For example, to create a new dataset with only individuals having >90 percent estimated European ancestry you can use the following one liner: data2 <- data[data$EUR>0.9,]. Hope this helps.
DeleteThanks for reply. Looks like column order in the ancestry file is incorrect.
Deleteit has to be "ID", "Case", "SNPs", "EV1", "EV2", "AFR", "EUR", "ASN". At least my data supports that. what do you say?
I just double checked. The order of the columns is correct in the above blog script with the order being "EUR", "AFR", and "ASN". It looks like there has been an update to SNPWEIGHTS since I last used it, so this may account for the difference. Also, the columns may be assigned in order of the most common ancestry in your input data which could account for the difference.
Delete