Sharing reliable protocols to transform datasets into gold standards: Application to Neuro-Vascular Pathologies
Coordinating Partner : Sarah Cohen-Boulakia
Coordinating institution : Paris-Saclay University
FAIR, workflows, standards, provenance, sharing and reuse of protocols, automatic annotation of datasets
Access to a wide variety of complementary, multi-scale and massive data collections offers unprecedented opportunities for healthcare research. A large number of analyses can be performed on these datasets, for scientific advances and discoveries to emerge. The national ‘Digital Health’ Acceleration Strategy ambitions to boost digital health innovation which includes designing innovative health data analysis approaches.
Importantly, such data analyses are complex, they rely on various computational tools that have to be parametrized and chained together. There is now compelling evidence that many scientific discoveries will not stand the test of time: increasing the reproducibility of computed results is of paramount importance, especially in the healthcare domain.
Sharing of health data is often hampered by personal data protection requirements and comes up against technical constraints (security, volume). These constraints can however be limited when the protocols and the workflows implementing analyses are sufficiently reusable to reproduce analyses in situ.
Additionally, when designed to be reusable, protocols and their implementations – workflows – provide the provenance traces of the analyzed data, describing how data results have been obtained and thus increasing scientists’ confidence in the results produced.
This calls for innovative solutions for the annotation of biomedical and clinical datasets and extraction of provenance. Protocols and their implementation as workflows using and generating datasets should be elevated to first-class objects and the inherent dual relationship between datasets and protocols/workflows should be better exploited.
Challenges thus include standardization and annotation for datasets and protocols, extracting protocols and workflows from text and other datasets, and synthesizing them into interoperable, yet shareable protocols.
The originality of ShareFAIR lies in tackling both the reliability of datasets and analysis protocols and in harnessing the dual relationship between datasets and protocols. Specifically, ShareFAIR will provide
- standards to uniformly represent datasets, ontologies/common vocabularies to annotate datasets and protocols/workflows, and provenance to trace the origin of datasets,
- an interoperable framework for the design, annotation and reuse of reliable and shareable protocols,
- approaches to extract protocols from textual data to enrich the set of protocols and workflows and better document the provenance of datasets, and approaches to learn protocols from biomedical and clinical datasets.
The proofs of concept and breakthroughs reached through ShareFAIR will be applied to real-life use cases related to neuro-vascular pathologies with multi-scale (genomic, neuro-vascular imaging and clinical) datasets and complex analysis protocols and workflows.
ShareFAIR will facilitate biomedical datasets re-analysis throughout scientific project lifecycles, and proactively participate in large-scale efforts towards more reproducible and cumulative science. At the data science level, ShareFAIR will provide a unique framework for FAIR-related interoperability research. The objective and methodology adopted in ShareFAIR aligns with prominent European research infrastructures such as ELIXIR and EOSC-Life.
Laboratory or department, team | Supervisors |
LISN – UMR 9015
Institut Convergence DATAIA |
CNRS, Paris-Saclay University, Inria
Centrale Supelec partner |
ITX – U1087 – UMR 6291 | Inserm, CNRS, Nantes University,
CHU Nantes partner |
Hub Bioinformatique – USR 3756 | Institut Pasteur, CNRS, Paris University, |
LIRIS UMR 5205, | CNRS, INSA Lyon, Claude Bernard Lyon 1 University, Lumière Lyon 2 University, Ecole Centrale Lyon |
LAMSADE, UMR7243 | CNRS, Paris-Dauphine-PSL University |
IRISA – UMR 6074, Eq Dyliss | CNRS, Inria, Rennes University |
EMPENN U 1228 | Inria, Inserm, CNRS, Rennes University |
CRC – U 1138 Eq HEKA | Inria, Inserm, Sorbonne University, Paris Cité University |
CEA LIST – LASTI lab | CEA, Paris-Saclay University |