INSU Newsletter - Scientific results
Major sequencing projects are fundamental to our understanding of living organisms in various fields (health, agronomy, ecology). Technological advances have made it possible to obtain a considerable amount of raw sequencing data (sequence reads). The European Nucleotide Archive currently contains almost 50 Petabytes of raw public data.
A team of researchers from CNRS Terre & Univers (MIO-OSU Pythéas), in collaboration with several research laboratories, has used k-mers (words of size k) to create a notion of word in the raw sequencing data. This indexing solution has made it possible to query several tens of terabytes of sequence data from the Tara Oceans project. The Ocean Read Atlas (ORA) public web server, developed for this purpose, can be used to directly query several datasets from the Tara Oceans consortium taken from all the world's oceans.