The large body of nucleic acid sequence data now available offers a unique opportunity for the characterization of individual oligonucleotides which may be specific to sequence functional domains. We have prepared algorithms for the study of the frequency distribution of all oligonucleotides of length 2-6 in DNA sequences. We have implemented them in the study of 634 mammalian DNA sequences spanning 1.782 Mb, and have obtained the distribution of the ratio between the observed frequency of oligonucleotides and their expected frequency based on independent nucleotide probabilities. We then studied the distribution of oligonucleotides (or k-tuples) of each length in a subset of 129 complete mammalian genes spanning 0.607 Mb. Eight distinct genomic regions, namely 5'-non-transcribed, first exon, first intron, intermediate exons, intermediate introns, last intron, last exon and 3'-non-transcribed, were considered. We observed that some oligonucleotides show a statistical behaviour and a regional distribution similar to that of known signal sequences. Moreover the frequency distribution of oligonucleotides of length 5 and 6 tends to become bimodal, indicating the existence of a population of very frequent oligonucleotides.

The frequency of oligonucleotides in mammalian genic regions

VOLINIA, Stefano;BERNARDI, Francesco;
1989

Abstract

The large body of nucleic acid sequence data now available offers a unique opportunity for the characterization of individual oligonucleotides which may be specific to sequence functional domains. We have prepared algorithms for the study of the frequency distribution of all oligonucleotides of length 2-6 in DNA sequences. We have implemented them in the study of 634 mammalian DNA sequences spanning 1.782 Mb, and have obtained the distribution of the ratio between the observed frequency of oligonucleotides and their expected frequency based on independent nucleotide probabilities. We then studied the distribution of oligonucleotides (or k-tuples) of each length in a subset of 129 complete mammalian genes spanning 0.607 Mb. Eight distinct genomic regions, namely 5'-non-transcribed, first exon, first intron, intermediate exons, intermediate introns, last intron, last exon and 3'-non-transcribed, were considered. We observed that some oligonucleotides show a statistical behaviour and a regional distribution similar to that of known signal sequences. Moreover the frequency distribution of oligonucleotides of length 5 and 6 tends to become bimodal, indicating the existence of a population of very frequent oligonucleotides.
1989
Volinia, Stefano; Gambari, R.; Bernardi, Francesco; Barrai, I.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in SFERA sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11392/1690769
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact