| Identification of Population Clusters Using Hierarchical clustering Based on Y-STR Distribution Profiles |
||||||||||
| I clustered population specific distribution profiles using a hierarchical clustering technique. In clustering I used Pearson correlation along with the Average Linkage (UPGMA) technique. In this analysis I chose to exclude Native American, East Asian, African as well as other various American populations. Most of these populations cluster very strongly with their group, but show very little linkage to other regions. Also, I only looked at the following Y-STR loci: DYS19, DYS390, DYS391, DYS392 and DYS393. All of this data was gleaned from the YSTR database. In Figure 1 I have shown the profiles for two populations that do not have a high correlation. The fraction of men having that particular number of repeats is seen on the vertical axis (i.e. more than 80% of men in S. Ireland have a 14 for DYS19). In Figure 2, you can see the result from the clustering and the inferred clusters. Instead of showing a plot as in figure 1, the profiles are shown as a heatmap, with the color red showing a high fraction and green showing zero. What does this all mean? First of all this is based on population distributions, not individual Y-STR profiles. Therefore, I am not able to add a particular person's profile to see where it clusters. I will be able to analyze the data from a Y-STR project that is based on geography or ethnicity, and cluster that data with all of the other Y-STR data. This is simply one way to show how haplotypes and even individual repeat frequencies help cluster modern populations. For questions, corrections and comments please email. |
||||||||||
| ______________________________________________________________________________________________________________ |
||||||||||
| Fig. 1 |
||||||||||
| Fig. 2 |
||||||||||
![]() |
||||||||||