Aim Comparative phylogeography across a large number of species allows investigating community-level processes at regional and continental scales. An effective approach to such studies would involve automatic retrieval of georeferenced sequence data from nucleotide databases (a first step towards an ‘automated phylogeography’). It remains unclear if, despite repeated calls, georeferencing of nucleotide databases has increased in frequency, and if accumulated data allow for broad applications based on automated retrieval of sequence data and associated geographical information. Here, we investigated geographical information available in NCBI GenBank accessions for tetrapods, exploring temporal and geographical patterns in georeferencing, and quantifying data available for automated phylogeography. Location Global. Methods We developed Python and R scripts to (1) download metadata from GenBank (1,125,514 accessions, > 20,000 species); (2) geocode accessions from associated metadata; (3) map originally georeferenced and geocoded accessions and plot their frequency against time; (4) assess the size of intraspecific sets of homologous sequences and compare their geographical extent with species ranges, thus evaluating their potential for phylogeographical analyses. Results Only 6.2% of surveyed tetrapod GenBank submissions reported geographical coordinates, without increase in recent years. Our geocoding raised georeferenced accessions to 15.1%. The geographical distribution of georeferenced accessions is patchy, and especially sparse in economically underdeveloped areas. Automatically retrievable informative data sets covering most of the range are available for very few species of wide-ranging tetrapods. Main conclusions Although geocoding offers a partial solution to the scarcity of direct georeferencing, the amount of data potentially useful for automated phylogeography is still limited. Strong underrepresentation of hard-to-access areas suggests that sampling logistics represent a main hindrance to global data availability. We propose that, besides enhancing georeferencing of genetic data, future research agendas should focus on collaborative efforts to sample genetic diversity in biodiversity-rich tropical areas.

A world of sequences: can we use georeferenced nucleotide databases for a robust automated phylogeography?

Trucchi, Emiliano
Penultimo
Writing – Original Draft Preparation
;
2017

Abstract

Aim Comparative phylogeography across a large number of species allows investigating community-level processes at regional and continental scales. An effective approach to such studies would involve automatic retrieval of georeferenced sequence data from nucleotide databases (a first step towards an ‘automated phylogeography’). It remains unclear if, despite repeated calls, georeferencing of nucleotide databases has increased in frequency, and if accumulated data allow for broad applications based on automated retrieval of sequence data and associated geographical information. Here, we investigated geographical information available in NCBI GenBank accessions for tetrapods, exploring temporal and geographical patterns in georeferencing, and quantifying data available for automated phylogeography. Location Global. Methods We developed Python and R scripts to (1) download metadata from GenBank (1,125,514 accessions, > 20,000 species); (2) geocode accessions from associated metadata; (3) map originally georeferenced and geocoded accessions and plot their frequency against time; (4) assess the size of intraspecific sets of homologous sequences and compare their geographical extent with species ranges, thus evaluating their potential for phylogeographical analyses. Results Only 6.2% of surveyed tetrapod GenBank submissions reported geographical coordinates, without increase in recent years. Our geocoding raised georeferenced accessions to 15.1%. The geographical distribution of georeferenced accessions is patchy, and especially sparse in economically underdeveloped areas. Automatically retrievable informative data sets covering most of the range are available for very few species of wide-ranging tetrapods. Main conclusions Although geocoding offers a partial solution to the scarcity of direct georeferencing, the amount of data potentially useful for automated phylogeography is still limited. Strong underrepresentation of hard-to-access areas suggests that sampling logistics represent a main hindrance to global data availability. We propose that, besides enhancing georeferencing of genetic data, future research agendas should focus on collaborative efforts to sample genetic diversity in biodiversity-rich tropical areas.
2017
Gratton, Paolo; Marta, Silvio; Bocksberger, Gaëlle; Winter, Marten; Trucchi, Emiliano; Kühl, Hjalmar
File in questo prodotto:
File Dimensione Formato  
Journal of Biogeography - 2016 - Gratton - A world of sequences can we use georeferenced nucleotide databases for a robust.pdf

solo gestori archivio

Descrizione: Full text editoriale
Tipologia: Full text (versione editoriale)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 866.61 kB
Formato Adobe PDF
866.61 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/2382816
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 39
  • ???jsp.display-item.citation.isi??? 37
social impact