Aim: Comparative phylogeography across a large number of species allows investigating community-level processes at regional and continental scales. An effective approach to such studies would involve automatic retrieval of georeferenced sequence data from nucleotide databases (a first step towards an âautomated phylogeographyâ). It remains unclear if, despite repeated calls, georeferencing of nucleotide databases has increased in frequency, and if accumulated data allow for broad applications based on automated retrieval of sequence data and associated geographical information. Here, we investigated geographical information available in NCBI GenBank accessions for tetrapods, exploring temporal and geographical patterns in georeferencing, and quantifying data available for automated phylogeography. Location: Global. Methods: We developed Python and R scripts to (1) download metadata from GenBank (1,125,514 accessions, >Â 20,000 species); (2) geocode accessions from associated metadata; (3) map originally georeferenced and geocoded accessions and plot their frequency against time; (4) assess the size of intraspecific sets of homologous sequences and compare their geographical extent with species ranges, thus evaluating their potential for phylogeographical analyses. Results: Only 6.2% of surveyed tetrapod GenBank submissions reported geographical coordinates, without increase in recent years. Our geocoding raised georeferenced accessions to 15.1%. The geographical distribution of georeferenced accessions is patchy, and especially sparse in economically underdeveloped areas. Automatically retrievable informative data sets covering most of the range are available for very few species of wide-ranging tetrapods. Main conclusions: Although geocoding offers a partial solution to the scarcity of direct georeferencing, the amount of data potentially useful for automated phylogeography is still limited. Strong underrepresentation of hard-to-access areas suggests that sampling logistics represent a main hindrance to global data availability. We propose that, besides enhancing georeferencing of genetic data, future research agendas should focus on collaborative efforts to sample genetic diversity in biodiversity-rich tropical areas.
Scheda prodotto non validato
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo
|Titolo:||A world of sequences: can we use georeferenced nucleotide databases for a robust automated phylogeography?|
|Data di pubblicazione:||2017|
|Appare nelle tipologie:||03.1 Articolo su rivista|