Project

Sharing reliable protocols to transform datasets into gold standards: Application to Neuro-Vascular Pathologies

Coordination

Coordinating Partner : Sarah Cohen-Boulakia

Coordinating institution : Paris-Saclay University

Key words

FAIR, workflows, standards, provenance, sharing and reuse of protocols, automatic annotation of datasets

Summary

Access to a wide variety of complementary, multi-scale and massive data collections offers unprecedented opportunities for healthcare research. A large number of analyses can be performed on these datasets, for scientific advances and discoveries to emerge. The national ‘Digital Health’ Acceleration Strategy ambitions to boost digital health innovation which includes designing innovative health data analysis approaches.

Importantly, such data analyses are complex, they rely on various computational tools that have to be parametrized and chained together. There is now compelling evidence that many scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance, especially in the healthcare domain.

Sharing of health data is often hampered by personal data protection requirements and comes up against technical constraints (security, volume). These constraints can however be limited when the protocols and the workflows implementing analyses are sufficiently reusable to reproduce analyses in situ.

Additionally, when designed to be reusable, protocols and their implementations – workflows – provide the provenance traces of the analyzed data, describing how data results have been obtained and thus increasing scientists’ confidence in the results produced.

This calls for innovative solutions for the annotation of biomedical and clinical datasets and extraction of provenance. Protocols and their implementation as workflows using and generating datasets should be elevated to first-class objects and the inherent dual relationship between datasets and protocols/workflows should be better exploited.

Challenges thus include standardization and annotation for datasets and protocols, extracting protocols and workflows from text and other datasets, and synthesizing them into interoperable, yet shareable protocols.

The originality of ShareFAIR lies in tackling both the reliability of datasets and analysis protocols and in harnessing the dual relationship between datasets and protocols. Specifically, ShareFAIR will provide

  • standards to uniformly represent datasets, ontologies/common vocabularies to annotate datasets and protocols/workflows, and provenance to trace the origin of datasets,
  • an interoperable framework for the design, annotation and reuse of reliable and shareable protocols,
  • approaches to extract protocols from textual data to enrich the set of protocols and workflows and better document the provenance of datasets, and approaches to learn protocols from biomedical and clinical datasets.

The proofs of concept and breakthroughs reached through ShareFAIR will be applied to real-life use cases related to neuro-vascular pathologies with multi-scale (genomic, neuro-vascular imaging and clinical) datasets and complex analysis protocols and workflows.

ShareFAIR will facilitate biomedical datasets re-analysis throughout scientific project lifecycles, and proactively participate in large-scale efforts towards more reproducible and cumulative science. At the data science level, ShareFAIR will provide a unique framework for FAIR-related interoperability research. The objective and methodology adopted in ShareFAIR aligns with prominent European research infrastructures such as ELIXIR and EOSC-Life.

Partners
Laboratory or department, team Supervisors
LISN – UMR 9015

Institut Convergence DATAIA

CNRS, Paris-Saclay University, Inria

Centrale Supelec partner

ITX – U1087 – UMR 6291 Inserm, CNRS, Nantes University,

CHU Nantes partner

Hub Bioinformatique – USR 3756  Institut Pasteur, CNRS, Paris University,
LIRIS UMR 5205, CNRS, INSA Lyon, Claude Bernard Lyon 1 University, Lumière Lyon 2 University, Ecole Centrale Lyon
LAMSADE, UMR7243 CNRS, Paris-Dauphine-PSL University
IRISA – UMR 6074, Eq Dyliss CNRS, Inria, Rennes University
EMPENN U 1228 Inria, Inserm, CNRS, Rennes University
CRC – U 1138      Eq HEKA Inria, Inserm, Sorbonne University, Paris Cité University
CEA LIST – LASTI lab CEA, Paris-Saclay University
Read more