The genetic structure and adaptation of Andean highlanders and Amazonian dwellers is influenced by the interplay between geography and culture

Western South America was one of the worldwide cradles of civilization. The well known Inca Empire was the tip of the iceberg of a cultural and biological evolutionary process that started 14-11 thousand years ago. Genetic data from 18 Peruvian populations reveal that: (1) The between-population homogenization of the central-southern Andes and its differentiation with respect to Amazonian populations of similar latitudes do not extend northward. Instead, longitudinal gene flow between the northern coast of Peru, Andes and Amazonia accompanied cultural and socioeconomic interactions revealed by archeological studies. This pattern recapitulates the environmental and cultural differentiation between the fertile north, where altitudes are lower; and the arid south, where the Andes are higher, acting as a genetic barrier between the sharply different environments of the Andes and Amazonia (2). The genetic homogenization between the populations of the arid Andes is not only due to migration during the Inca Empire or the subsequent colonial period. It started at least during the earlier expansion of the pre-Inca Wari Empire (600-1000 YBP) (3) This demographic history allowed for cases of positive natural selection in the high and arid Andes vs. the low Amazon tropical forest: in the Andes, HAND2-AS1 (heart and neural crest derivatives expressed 2 antisense RNA1, related with cardiovascular function) and DUOX2 (dual oxidase 2, related to thyroid function and innate immunity) genes; in the Amazon, the gene encoding for the CD45 protein, essential for antigen recognition by T/B lymphocytes in viral-host interaction, consistent with the host-virus arms race hypothesis.


Main text
Western South America was one of the cradles of civilization in the Americas and the world 1 . When the Spaniard conqueror Francisco Pizarro arrived in 1532, the pan-Andean Inca Empire ruled in the Andean region and had achieved levels of socioeconomic development and population density unmatched in other parts of South America. However, the Inca Empire, which lasted for around 200 years, with its emblematic architecture such as Machu Picchu and the city of Cuzco, was just the tip of the iceberg of a millenary cultural and biological evolutionary process 2,3 . This process started with the peopling of the region (hereafter called western South America ), that occurred 14-11 thousand years ago 4-6 , involving the entire Andean region and its adjacent and narrow Pacific Coast.
Tarazona-Santos et al. 7 proposed that cultural exchanges and gene flow along time have led to a relative genetic, cultural, and linguistic homogeneity between the populations of western South America compared with those of eastern South America (a term that hereafter refers to the region adjacent to the eastern slope of the Andes and eastward, including the Amazonia), where populations remained more isolated from each other. For instance, only two languages (Quechua and Aymara) of the Quechumaram linguistic stock predominate in the entire Andean region, whereas in eastern South America natives speak a different and broader spectrum of languages classified into at least four linguistic families 3,7,8 . This spatial pattern of genetic diversity and its correlation with geography, environmental, linguistic and cultural diversity was confirmed, enriched and rediscussed by us and others 2,3,7-13 .
There are pending issues: First, whether the dichotomic organization of genetic variation characterized by the between-population homogeneous Southern Andes vs.
between-population heterogeneous Central Amazon, extends northward. This is important because scholars from different disciplines emphasize that western South America is not latitudinally homogeneous, differentiating a northern and in general lower and wetter fertile Andes and a southern, higher and more arid Andes 14 . These environmental and latitudinal differences are correlated with demography and culture, including different spectra of domesticated plants and animals. Indeed, the development of agriculture, of the first urban centers such as Caral 1 and its associated demographic growth, occurred earlier in the northern Fertile Andes (around 5ky ago) than in the southern arid Andes (and their associated Coast), with products such as cotton, beans, and corn domesticated in the fertile north and the potato and South American camelids in the arid south 14 . In human population genetics studies, the region where the between-population homogeneity was ascertained by Tarazona-Santos et al. 7 was the arid Andes. Consequently, here we test (i) whether the between-population homogenization of Western South America, and the dichotomy Arid Andes/Amazonia extends to the northward Fertile Andes associated regions? To address this and the below questions, we used data from Harris et al. 3 for 74 indigenous individuals and an additional 289 unpublished individuals from 18 Peruvian populations, genotyped for ~2.5 million SNPs ( Figure 1 and Table S1). We created three datasets with different SNP densities and populations, including data from 15-18 ( Figure S1, Tables S2 and S3 and Supporting information-SI). Institutional Review Boards of participants institutions approved this research. Despite some controversy about definitions and chronology, archeologists identify a unique cultural process in Western South America, which include three temporal Horizons : Early, Middle, and Late, that corresponds to periods of cultural dispersion involving a wide geographic area 26 (Figure 2). In particular, the Middle and Late Horizons are associated with the expansions of the Wari (~1400 to 1000 YBP) and Inca (~524 to 466 YBP) states, respectively [27][28][29] . Isbell 28 has suggested that the Wari expansion has been associated with the spread of the Quechua language in the Central Andes and the Wari were pioneers in developing a road system in the Andes, called Wari ñam , which was later used as a base by We analyzed the distribution of IBD-segment lengths between individuals of different arid Andean populations, which is informative for the dynamics of past gene flow 3,30 , and observed a signature of gene flow in the interval 1400 to 1000 YBP, that is within the Wari expansion in the Middle Horizon ( Figure 2). Thus, the homogenization of the Central Arid Andes is not only due to migrations during the Inca Empire or later during the Spanish Viceroyalty of Peru, when migrations (often forced) occurred 31 . The Wari expansion (1400 to 1000 YBP) was also accompanied by intensive gene flow whose signature is still present in the between-population genetic homogeneity of the arid central Andes region. Because IBD analysis on current individuals does not allow for inferences of gene flow that occurred more than 75 generations ago 30 , ancient DNA analysis at the population level will be necessary to infer if the between-population homogenization of the Andes started even earlier. Native Americans had to adapt to different and contrasting environments and stress. The high and arid Andes is characterized by high UV radiation, cold, dryness, and hypoxia (a stress that does not allow for cultural adaptations and requires biological changes) 32,33 . The Amazon has a low incidence of light, a warm and humid climate typical of the rainforest and high biodiversity, including human pathogens 34 . Populations from the high and arid Andes and from the Amazon (Figure 1) settled in these contrasting environments more than 5000 years ago 35  HAND2-AS1 is located in the antisense 5' region of HAND2 and contains 2 enhancers for this gene. A natural selection genome-wide scan 36 identified three genes related to the cardiovascular system in Andeans, including TBX5, which works together with HAND2 in reprogramming fibroblasts to cardiac-like myocytes 40,41 . This information suggests (but does not demonstrate) that HAND2-AS1 signature of natural selection is related with cardiovascular adaptations. Andeans have cardiovascular adaptations to high altitude that differ from those of lowlanders exposed to hypoxia and from other highlanders, showing higher pulmonary vasoconstrictor response to hypoxia and lower resting middle cerebral flow velocity than Tibetans, and higher uterine artery blood flow than Europeans raised in high altitude and than lowlanders 42 .
DUOX2 (dual oxidase 2, chromosome 15) is the gene with the highest signal of adaptation to the Andean environment by PBSn analysis (Figure 3). It was reported as a natural selection target in the Andes by 43,44 .  (Table S5).
The second highest signal (that also shows a significant long-range haplotype signal) comes from the region around the gene PTPRC , which encodes the protein CD45, essential in antigen recognition by T and B lymphocytes, particularly in pathogen-host interaction, in particular for virus such as Human adenovirus type 19 52 , HIV-1-induced cell apoptosis 53,54 and susceptibility to Hepatitis C 55,56 and herpes simplex virus 1 57 . Interestingly, HSV-1 this herpes virus has a high incidence in isolated Amerindians from the Peruvian and Brazilian Amazon [58][59][60] , with the elevated diversity of the virus and an endemic subtype that suggest an ancient endemic infection 61 . This result is consistent with the hypothesis of CD45 evolution driven by the host-virus arms race model 62 .

Acknowledgments
We thank the Peruvian populations for their participation. We thank the members of the