HGDP-CEPH human genome diversity cell line panel
The diversity panel is a large and widely-used collection of DNA
samples from individuals distributed around the world. Several of our
papers have utilized genotypes from the diversity panel. Here we
provide microsatellite, indel, and SNP data exactly as used in these
papers.
Note that slightly different versions of our microsatellite and
indel data sets are located at the website of the Marshfield Clinic Research Foundation. In cases where it is of
interest to compare new results on the diversity panel to what has
been seen in our previous work, we recommend using the files
downloadable from this site, rather than those available in Microsoft
Excel from Marshfield.
Further information about the microsatellite markers, such as PCR
primers and map positions, are available from
Marshfield.
Repeat numbers and sequence properties of microsatellites
(Posted January 21, 2010) HGDP microsatellite properties are now
available online for
TJ Pemberton,
CI Sandefur,
M Jakobsson,
NA
Rosenberg (2009) Sequence determinants of human microsatellite
variability.
BMC Genomics 10: 612. [
Abstract] [
Full text at
journal website] [
PDF] [
Supplementary table 1
(XLS)] [
Supplementary table 2
(XLS)] [
Supplementary
tables 3-6 (PDF)]
HGDP+India SNP data
(Posted June 27, 2008) HGDP+India SNP
data are now
available online for
TJ Pemberton*,
M Jakobsson*, DF Conrad, G Coop, JD
Wall, JK Pritchard, PI Patel,
NA Rosenberg (2008) Using
population mixtures to optimize the utility of genomic databases:
linkage disequilibrium and association study design in
India.
Annals of Human Genetics 72: 535-546.
[
Abstract]
HGDP high-resolution genome-wide SNP data
(Posted Feb
26, 2008) HGDP SNP data are now
available
online for
M Jakobsson*, SW Scholz*,
P
Scheet*, JR Gibbs,
JM VanLiere, H-C Fung,
ZA
Szpiech,
JH Degnan, K Wang, R Guerreiro, JM Bras, JC
Schymick, DG Hernandez, BJ Traynor, J Simon-Sanchez, M Matarin, A
Britton, J van de Leemput, I Rafferty, M Bucan, HM Cann, JA Hardy,
NA Rosenberg, AB Singleton (2008) Genotype, haplotype and
copy-number variation in worldwide human populations.
Nature
451: 998-1003. [
Abstract]
[
PDF]
HGDP SNP data
(Posted May 23, 2007) HGDP SNP data are
now
available online for
DF Conrad*,
M Jakobsson*, G Coop*, X Wen, JD Wall,
NA Rosenberg, JK
Pritchard (2006) A worldwide survey of haplotype variation and linkage
disequilibrium in the human genome.
Nature Genetics 38:
1251-1260. [
Abstract]
[
PDF]
Relatives
(Posted October 17, 2006) It is recommended
that anyone working with the diversity panel read the following paper,
which reports a variety of anomalies in the diversity panel
individuals and recommends standard subsets for future use.
NA Rosenberg (2006) Standardized subsets of the HGDP-CEPH
Human Genome Diversity Cell Line Panel, accounting for atypical and
duplicated samples and pairs of close relatives. Annals of Human
Genetics 70: 841-847. [Abstract] [PDF] [Supplement] [Spreadsheet with recommended subsets (txt format)] [Spreadsheet with
recommended subsets (xls format)]
Data sets
If you are using any of the data files on this site and wish to be
contacted in case of updates or modifications, please send an email to
Noah Rosenberg.
377 autosomal microsatellites in 1056 individuals from 52 populations
The following data files, all in plain text format, are used in the
each of the papers listed below. The markers are drawn from
Marshfield screening set 10. A description of how these data files
differ from those on the Marshfield site is in the online supplement to our 2002 paper.
List of papers that use the above files:
- NA Rosenberg, JK Pritchard, JL Weber, HM Cann, KK Kidd, LA
Zhivotovsky, MW Feldman (2002) Genetic structure of human
populations. Science 298: 2381-2385. [Abstract] [Full
Text at Science website] [PDF] [Supplement] [Software for drawing figures] [Español]
- LA Zhivotovsky, NA Rosenberg, MW Feldman (2003)
Features of evolution and expansion of modern humans, inferred from
genomewide microsatellite markers. American Journal of Human
Genetics 72: 1171-1186. [Abstract] [PDF]
- NA Rosenberg, JK Pritchard, JL Weber, HM Cann, KK Kidd, LA
Zhivotovsky, MW Feldman (2003) Response to comment on "Genetic
structure of human populations." Science 300: 1877. [Abstract] [PDF]
- NA Rosenberg, LM Li, R Ward, JK Pritchard (2003)
Informativeness of genetic markers for inference of
ancestry. American Journal of Human Genetics 73: 1402-1422.
[Abstract] [PDF] [Supplement] [SNP data] [SNP data readme] [Solution to Problem 11039 required in
appendix of paper (American Mathematical Monthly 112:
572-573, 2005)]
- S Ramachandran, NA Rosenberg, LA Zhivotovsky, MW Feldman
(2004) Robustness of the inference of human population structure: a
comparison of X-chromosomal and autosomal microsatellites. Human
Genomics 1: 87-97. [Abstract] [PDF]
- NA Rosenberg (2005) Algorithms for selecting
informative marker panels for population assignment. Journal of
Computational Biology 12: 1183-1201. [Abstract] [PDF]
List of papers that use slightly altered versions of the above
files (the alterations are described in the papers):
- NA Rosenberg, PP Calabrese (2004) Polyploid and multilocus
extensions of the Wahlund inequality. Theoretical Population
Biology 66: 381-391. [Abstract] [PDF]
- NA Rosenberg, MGB Blum (2007) Sampling properties
of homozygosity-based statistics for linkage
disequilibrium. Mathematical Biosciences 208: 33-47. [Abstract] [PDF]
783 autosomal microsatellite loci and 210 insertion/deletion
polymorphisms in 1048 individuals from 53 populations
The following data files, all in plain text format, are used in the
each of the papers listed below. The microsatellite marker are drawn
from Marshfield screening sets 10, 13, and 52, and the indels are
drawn from Marshfield screening set 100. A description of how these
data files differ from those on the Marshfield site is in the
Ramachandran et al. (2005) and Rosenberg et al. (2005) papers.
In choosing data files for analysis, note that there are slight
differences between the data used by Ramachandran et al. (2005) and
those used by Rosenberg et al. (2005)
- Individual
genotype data file used by Rosenberg et al. (2005), in
structure format, 783 microsatellites and 210 indels (7.5
Mb).
- Individual
genotype data file used by Rosenberg et al. (2005), in
structure format, 783 microsatellites only (6.6 Mb).
- Individual
genotype data file used by Rosenberg et al. (2005), in
structure format, 210 indels only (1.0 Mb).
- Individual
genotype data file used by Rosenberg et al. (2005), in NEXUS
format, 783 microsatellites only (6.5 Mb).
- Individual
genotype data file used by Rosenberg et al. (2005), in NEXUS
format, 210 indels only (0.9 Mb).
- Allele
frequencies for data used by Rosenberg et al. (2005), 783
microsatellites only (11.2 Mb)
- Allele
frequencies for data used by Rosenberg et al. (2005), 210 indels
only (0.5 Mb)
- Latitudes,
longitudes, and spherical coordinates used by Rosenberg et al. (2005)
- Population
codes in files associated with Rosenberg et al. (2005)
- Individual
genotype data file used by Ramachandran et al. (2005), in
structure format, 783 microsatellites only (6.4 Mb).
- Individual
genotype data file used by Ramachandran et al. (2005), in NEXUS
format, 783 microsatellites only (6.3 Mb).
- Latitudes
and longitudes used by Ramachandran et al. (2005)
- Population
codes in files associated with Ramachandran et al. (2005)
- Readme
— further description of the 13 previous files.
List of papers that use the above files:
- S Ramachandran, O Deshpande, CC Roseman, NA Rosenberg, MW
Feldman, LL Cavalli-Sforza (2005) Support from the relationship of
genetic and geographic distance in human populations for a serial
founder effect originating in Africa. Proceedings of the National
Academy of Sciences USA 102: 15942-15947. [Abstract] [PDF] [Supplementary Figure 6]
[Supplementary Table
2] [Supplementary
text]
- NA Rosenberg, S Mahajan, S Ramachandran, C Zhao, JK
Pritchard, MW Feldman (2005) Clines, clusters, and the effect of study
design on the inference of human population structure. PLoS
Genetics 1: 660-671. [Abstract] [Full-text at journal website] [PDF]
- NA Rosenberg (2006) Standardized subsets of the HGDP-CEPH
Human Genome Diversity Cell Line Panel, accounting for atypical and
duplicated samples and pairs of close relatives. Annals of Human
Genetics 70: 841-847. [Abstract] [PDF] [Supplement]
[Spreadsheet with recommended subsets (txt format)] [Spreadsheet with
recommended subsets (xls format)]
- ZA Szpiech, M Jakobsson, NA Rosenberg (2008)
ADZE: a rarefaction approach for counting alleles private to
combinations of populations. Bioinformatics 24:
2498-2504. [Abstract] [
Full text at journal website] [PDF] [Software]
- NA Rosenberg, M Jakobsson (2008) The relationship
between homozygosity and the frequency of the most frequent
allele. Genetics 179: 2027-2036.
[Abstract]
- M DeGiorgio, NA Rosenberg (2009) An unbiased estimator of
gene diversity in samples containing related individuals.
Molecular Biology and Evolution 26: 501-512.
[Abstract]
2834 single-nucleotide polymorphisms polymorphisms in 927 individuals
from 52 populations
Download SNP data (you
will be directed first to a registration page and we would very much
appreciate if you register)
List of papers that use the SNP data:
- DF Conrad*, M Jakobsson*, G Coop*, X Wen, JD Wall, NA
Rosenberg, JK Pritchard (2006) A worldwide survey of haplotype
variation and linkage disequilibrium in the human genome. Nature
Genetics 38: 1251-1260. [Abstract] [PDF]
2810 single-nucleotide polymorphisms polymorphisms in 957 individuals
from 55 populations
These data update the data of Conrad et
al. (2006) described
above.
Download SNP data (you
will be directed first to a registration page and we would very much
appreciate if you register)
List of papers that use the SNP data:
- TJ Pemberton*, M Jakobsson*, DF Conrad, G Coop, JD
Wall, JK Pritchard, PI Patel, NA Rosenberg (2008) Using
population mixtures to optimize the utility of genomic databases:
linkage disequilibrium and association study design in
India. Annals of Human Genetics 72: 535-546. [Abstract] [PDF]
- L Huang, Y Li, AB Singleton, JA Hardy, G Abecasis, NA
Rosenberg, P Scheet (2009) Genotype imputation accuracy
across worldwide human populations. American Journal of Human
Genetics 84: 235-250. [Abstract]
525,910 single-nucleotide polymorphisms and 1428 copy-number variable
loci in 485 individuals from 29 populations
Download
SNP data
List of papers that use the SNP and copy number data:
- M Jakobsson*, SW Scholz*, P Scheet*, JR Gibbs, JM
VanLiere, H-C Fung, ZA Szpiech, JH Degnan, K Wang, R
Guerreiro, JM Bras, JC Schymick, DG Hernandez, BJ Traynor, J
Simon-Sanchez, M Matarin, A Britton, J van de Leemput, I Rafferty, M
Bucan, HM Cann, JA Hardy,
NA Rosenberg, AB Singleton (2008) Genotype, haplotype and
copy-number variation in worldwide human populations. Nature
451: 998-1003. [Abstract]
[PDF]
- L Huang, Y Li, AB Singleton, JA Hardy, G Abecasis, NA
Rosenberg, P Scheet (2009) Genotype imputation accuracy
across worldwide human populations. American Journal of Human
Genetics 84: 235-250. [Abstract]
- JT Mosher, TJ Pemberton, K Harter, C Wang,
EO Buzbas, P Dvorak, C Simon, SJ Morrison, NA Rosenberg
(2010) Lack of population diversity in commonly used human embryonic
stem-cell lines. New England Journal of Medicine 362:
183-185.
627 autosomal microsatellite loci in 1048 individuals,
with repeat numbers and sequence properties
For 627 HGDP microsatellites, these files provide sequence
properties, such as the structure of the repeat motif and the GC
content of the flanking region. They also convert the PCR fragment
lengths in nucleotides to numbers of repeats, by calibration with the
human genome reference sequence.
Download microsatellite data
List of papers that use the microsatellite repeat data:
History
Created with 377 microsatellites, 22 November 2002
Addition of NEXUS file for 377 microsatellites, 28 December 2002
Minor modifications to site, 30 April 2004
Addition of data on 783 microsatellites and 210 indels, 1 November 2005
Addition of standardized subsets of individuals, 17 November 2006
Addition of SNP data from Conrad et al., 23 May 2007
Addition of genome-wide SNP and copy-number data, 26 February 2008
Addition of SNP data from Pemberton et al., 27 June 2008
Addition of sequence properties of microsatellites, 21 January 2010