Studies of genetic structure are important in disease-gene association research, conservation genetics, and anthropological research. Observations of genetic structure mirror the results of genealogical studies and are affected by the demographic history of populations studied and the components of study design considered. Comparing measures of population structure from genealogical and genetic data, the former from pedigrees and the latter from microsatellite markers, we find that the two measures are closely correlated. We demonstrate this showing Pearson correlation between membership to a cluster and average kinship with other members of the cluster, and increased kinship within clusters and FST between clusters with increased stringency on membership to the cluster. Further, though we show that model-based clustering algorithms are able to detect structuring in scantily-differentiated, closely-related populations, we also find that much of the structuring is due to groups of closely-related individuals in the populations. Further, we find that relatedness to the level of second cousins contributes to the observation of family-based structuring. We move from highly-consanguineous populations where the presence of family members contributes largely to the observation of structure to a general model showing the effect of demographic history and study design on structuring in lowly-differentiated isolated populations. Using simulations of scantily-differentiated populations, we show that longer periods of drift and lower effective sizes increase the likelihood of observing structure when it is present, as does larger sample sizes and number of markers considered. We also find interaction and compensatory effects of these demographic and design factors. For example, increasing divergence decreases the number of markers and sample size needed to observe structure. Examining pairs of populations from a publically-available, globally-distributed dataset, we find a negative correlation between the number of markers needed to observe differentiation and decreasing upper limit of markers needed to observe differentiation with increasingly divergent population pairs (i.e., increased FST). We also find, for populations with FST greater than 0.01, no distinct lower limit to the number of markers needed to observe differentiation. We conclude that studies based on low numbers of markers need to be approached with caution, since observations of structure may be missed due to inadequacy in the numbers of markers, and any observation of clustering may not fully represent actual structuring present in sampled populations. In the case of populations with FST < 0.01, 100 markers or more may be needed. Using more informative markers may decrease the number of markers required, however. Moving on from examining model-based clustering algorithms in the context of isolated and world-wide populations, we explore different models of human evolution. First, we explore evidence for natural selection in a geographically-restricted allele for the ALDH2 gene, involved in ethanol metabolism, and find that the East Asian limited-distribution of the allele, given its high frequency, is not consistent with neutral expectations. Second, we explored various models of human evolution to determine whether they were consistent with observed patterns of genetic variation. We find human genetic variation to be consistent with a serial founder effects model with long range gene flow between Eurasian populations. Further, we show that deviations from predictions under the founder effects model are likely the result of introgression of archaic human genes into modern humans as our ancestors left Africa. Finally, we developed a new version of AIDA with increased capability for sample sizes and sequence length, as well as microsatellite data

On the study of genetic structure in human populations and the effects of demographic history, consanguinity, and study design on detection, with investigations of human evolutionary models, archaic introgression, and natural selection

FERRUCCI, Ronald Robert
2011

Abstract

Studies of genetic structure are important in disease-gene association research, conservation genetics, and anthropological research. Observations of genetic structure mirror the results of genealogical studies and are affected by the demographic history of populations studied and the components of study design considered. Comparing measures of population structure from genealogical and genetic data, the former from pedigrees and the latter from microsatellite markers, we find that the two measures are closely correlated. We demonstrate this showing Pearson correlation between membership to a cluster and average kinship with other members of the cluster, and increased kinship within clusters and FST between clusters with increased stringency on membership to the cluster. Further, though we show that model-based clustering algorithms are able to detect structuring in scantily-differentiated, closely-related populations, we also find that much of the structuring is due to groups of closely-related individuals in the populations. Further, we find that relatedness to the level of second cousins contributes to the observation of family-based structuring. We move from highly-consanguineous populations where the presence of family members contributes largely to the observation of structure to a general model showing the effect of demographic history and study design on structuring in lowly-differentiated isolated populations. Using simulations of scantily-differentiated populations, we show that longer periods of drift and lower effective sizes increase the likelihood of observing structure when it is present, as does larger sample sizes and number of markers considered. We also find interaction and compensatory effects of these demographic and design factors. For example, increasing divergence decreases the number of markers and sample size needed to observe structure. Examining pairs of populations from a publically-available, globally-distributed dataset, we find a negative correlation between the number of markers needed to observe differentiation and decreasing upper limit of markers needed to observe differentiation with increasingly divergent population pairs (i.e., increased FST). We also find, for populations with FST greater than 0.01, no distinct lower limit to the number of markers needed to observe differentiation. We conclude that studies based on low numbers of markers need to be approached with caution, since observations of structure may be missed due to inadequacy in the numbers of markers, and any observation of clustering may not fully represent actual structuring present in sampled populations. In the case of populations with FST < 0.01, 100 markers or more may be needed. Using more informative markers may decrease the number of markers required, however. Moving on from examining model-based clustering algorithms in the context of isolated and world-wide populations, we explore different models of human evolution. First, we explore evidence for natural selection in a geographically-restricted allele for the ALDH2 gene, involved in ethanol metabolism, and find that the East Asian limited-distribution of the allele, given its high frequency, is not consistent with neutral expectations. Second, we explored various models of human evolution to determine whether they were consistent with observed patterns of genetic variation. We find human genetic variation to be consistent with a serial founder effects model with long range gene flow between Eurasian populations. Further, we show that deviations from predictions under the founder effects model are likely the result of introgression of archaic human genes into modern humans as our ancestors left Africa. Finally, we developed a new version of AIDA with increased capability for sample sizes and sequence length, as well as microsatellite data
BARBUJANI, Guido
BARBUJANI, Guido
File in questo prodotto:
File Dimensione Formato  
422.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 3.94 MB
Formato Adobe PDF
3.94 MB Adobe PDF Visualizza/Apri

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2388805
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact