added tags to description
“ML on FHIR”
This repo demonstrates an end-to-end workflow to build an ML model for disease risk prediction starting from synthetic patient data in FHIR format, roughly based upon this paper: Chen, A., Chen, D.O. Simulation of a machine learning enabled learning health system for risk prediction using synthetic patient data. Sci Rep 12, 17917 (2022). https://doi.org/10.1038/s41598-022-23011-4. That paper started from an export of Synthea synthetic data in CSV format, and used an undisclosed data processing pipeline to flatten and collate the SNOMED and LOINC codes from multiple tables into a single “ML-ready” table with one row per patient. The ML-ready table is ideal for using AutoML tabular machine learning techniques to predict risk of a disease or complication, and the authors developed machine learning models using Python and scikit-learn and other libraries.
In contrast, this repository demonstrates a similar workflow with the following modifications:
Therefore, we have provided an end-to-end demonstration of using InterSystems’ unique capabilities in data management and Machine Learning, combined with open source projects Synthea and dbt, to process, analyze and develop machine learning models that are then accesses easily from SQL for instant viewing in dashboards like Superset!
This work was presented at InterSystems Global Summit 2023, see presentation recording!
docker-compose up -d --build
The start may take a while, until the FHIR server will be installed and activated
./run_synthea.sh -p 100 -a 30-100 -s 1234 -cs 1234
./load-data.py
FHIR SQL Builder configured to expose FHIR data through SQL schema fhir
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cd dbt
dbt deps
dbt debug
dbt build
After all this steps IRIS will contain table mlonfhir.summary
ready for ML