This file describes the exact data used in the Pemberton et al (2009) BMC Genomics article "Sequence determinants of human microsatellite variability." The 627 loci in the new data are a subset of the 783 microsatellite loci in the Ramachandran et al. (2005) PNAS article "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa" and in the Rosenberg et al. (2005) PLoS Genetics article "Clines, clusters, and the effect of study design on the inference of human population structure." The new data set contains genotypes in the form of the number of repeats they represent rather than the PCR fragment sizes used in Ramachandran et al. (2005) and Rosenberg et al. (2005). For a description of how PCR fragment sizes were converted into number of repeats and how these loci were selected please see Pemberton et al. (2009). With questions about these files, please contact me. Trevor Pemberton November 29, 2009 ------------------------------------------------------------------- 1. combinedmicrosats_627loci_1048indivs_numRpts.stru This file includes the exact data used by Pemberton et al. (2009). The format is that used by the structure program. The first line gives the list of microsatellites. The second line gives the size of microsatellite repeat units in nucleotides (2 = di, 3 = tri, 4 = tetra). The third line gives the number of separate short tandem repeat (STR) regions in microsatellite sequences (1, 2, or 3). The fourth line states whether the alleles in the HGDP-CEPH individuals are regular (all alleles are separated by exact multiples of the repeat unit size) or irregular (one or more alleles are not separated by exact multiples of the repeat unit size). After the first four lines, each individual is listed on two consecutive lines. The first five columns include the following information: (1) Individual code number assigned by CEPH. (2) Population code number, as was used in Rosenberg et al. (2002). (3) Population name. (4) Geographic information about the population. (5) Pre-defined region, as was used in Rosenberg et al. (2002). ------------------------------------------------------------------- 2. Pemberton_AdditionalFile1_11242009.txt This file is Supplemental Table S1 (Additional File 1) from Pemberton et al. (2009) in a tab-delimited text file format. ------------------------------------------------------------------- 3. Pemberton_AdditionalFile2_11292009.txt This file is Supplemental Table S2 (Additional File 2) from Pemberton et al. (2009) in a tab-delimited text file format. -------------------------------------------------------------------