Pending of final acceptance by the European Commission

This document is the first in a series of deliverables on data-driven analytical tools for supporting policy makers develop healthcare policies. The focus here is on population-level risk stratification, employing machine learning tools for stratifying segments of the population into different levels of risk (low, medium, high).

One of the primary aims of this work is to help inform policy makers of what the population-level health risks are, which may influence priorities in existing policies or identify needs for new policies. Further, to determine which segments of the population are of greater risk, so as to help targeting policies and optimising management of care. Given the data-driven approach, the risk stratification process also has the opportunity to identify new risk factors, which could be adopted in policies as well as contributing to the research community.

In this deliverable, we propose a risk stratification process that goes beyond the state-of-the-art, addressing key challenges with medical data and risk stratification, such as the temporal (changing) nature of risk and dealing with data that is not missing at random. For these and other challenges, we review, discuss and propose solutions, covering both data processing techniques and machine learning classifiers.

This risk stratification process will be applied by technical partners in the CrowdHEALTH project to address specific risk scenarios grounded in the project use cases. We include a high level analysis of such opportunities in this deliverable, but note that this is preliminary work as data was not available at the time of writing.

deliverable type: