Seminarios IHSM La Mayora - Jean-Michel Hily (Institut Français de la Vigne et du Vin (IFV))

Datamining, a powerful tool to magnify an infinite source of information hitherto put aside. With the dawn of high throughput sequencing (HTS), the deposit and accumulation of genetic information in digital form within dedicated databases (metadata) is massive and ever growing. Datamining, i.e. the process of collecting, searching, extracting and discovering usable information within such large amount of data, is therefore becoming a very important and powerful tool to identify possible new pathogens, as well as new viruses or new variants of known viruses, such as for example from the now well-known Coronaviridae family (https://virological.org/t/serratus-the-ultra-deep-search-to-discover-novel-coronaviruses/516). Grapevine Pinot gris virus (GPGV) is a newly described virus (Giampetruzzi et al. 2012) that infects grapevine and has now been detected in most, if not all grape-growing countries where it has been sought. While its presence is sometimes associated with severe mottling and deformation symptoms, the virus is generally detected in asymptomatic vines. Prior to this work, knowledge on the genetic diversity of GPGV was mostly limited to biased and partial genomic sequences based on PCR analyses. By performing a systematic datamining effort over 500 samples using publicly available SRA (Sequence Read Archives) files as well as in-house dataset, and in association with specific bio-informatic tools, we uncovered invaluable information regarding GPGV. The knowledge revealed from this work is relevant at different levels with information regarding (1) varieties and countries where the virus was detected from, (2) the precise epidemiological data linked to specific locations around the world, (3) the obtention of an important number of unbiased complete GPGV genomic sequences, (4) reporting a so far undescribed genetic diversity which ultimately allowed (5) the unraveling of the worldwide evolutionary history of the virus (Hily et al. 2021b; Hily et al. 2021a; Hily et al. 2020). Out of these ‘proof of concepts’ studies, some advantages and pitfalls of datamining will be discussed. Keywords: HTS, Datamining, grapevine, virology, epidemiology Giampetruzzi, A., Roumi, V., Roberto, R., Malossini, U., Yoshikawa, N., La Notte, P., Terlizzi, F., Credi, R., and Saldarelli, P. 2012. A new grapevine virus discovered by deep sequencing of virus- and viroid-derived small RNAs in cv Pinot gris. Virus Research 163:262-268. Hily, J.-M., Komar, V., Poulicard, N., Vigne, E., Jacquet, O., Protet, N., Spilmont, A.-S., and Lemaire, O. 2021a. Biological Evidence and Molecular Modeling of a Grapevine Pinot gris Virus Outbreak in a Vineyard. Phytobiomes Journal 0:PBIOMES-11-20-0079-R. Hily, J.-M., Poulicard, N., Candresse, T., Vigne, E., Beuve, M., Renault, L., Velt, A., Spilmont, A.-S., and Lemaire, O. 2020. Datamining, genetic diversity analyses and phylogeographic reconstructions redefine the worldwide evolutionary history of grapevine Pinot gris virus and grapevine berry inner necrosis virus. Phytobiomes Journal 4:165-177. Hily, J.-M., Komar, V., Poulicard, N., Velt, A., Renault, L., Mustin, P., Vigne, E., Spilmont, A.-S., and Lemaire, O. 2021b. Evidence of differential spreading events of grapevine pinot Gris virus in Italy using datamining as a tool. European Journal of Plant Pathology.