Home Applications Health Dataset

Health Dataset

This application is not supported by InterSystems Corporation. Please be notified that you use it at your own risk.
4 reviews
IPM installs
Pull requests
10 curated health datasets (cancer, heart, diabetes, kidney)

What's new in this version

fixed zpm install references

About Health Data Application

This is an application to get Health Data samples for AutoML and another types of applications.

According to the WHO, The top global causes of death, in order of total number of lives lost, are associated with three broad topics (source: https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death):

  1. Cardiovascular (ischaemic heart disease, stroke),
  2. Respiratory (chronic obstructive pulmonary disease, lower respiratory infections) and
  3. Neonatal conditions – which include birth asphyxia and birth trauma, neonatal sepsis and infections, and preterm birth complications.

This application provides real data (without personal data) for some of these top 10 scenarios of diseases identified by WHO. The datasets for this application are:

10 real health datasets

  • Diabetes dataset: data to predict diabetes diagnosis
  • Heart Disease dataset: data to predict heart disease
  • Kidney Disease dataset: data to predict kidney disease
  • Breast Cancer dataset: data to predict breast cancer
  • Maternal Risk dataset: data to predict maternal risk level
  • Hospital Mortality dataset: data to predict hospital mortality
  • World Life Expectancy dataset: data to predict life expectancy based in the country social and health indicators
  • Pollution Deaths from fossil fuels dataset: data to predict deaths caused fossil fuels pollution
  • Dementia dataset: data to predict dementia
  • Hepatitis dataset: data to predict death risk caused hepatitis symptoms evolution


  1. Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/automl-heart.git
  1. Open a Docker terminal in this directory and run:
$ docker-compose build
  1. Run the IRIS container:
$ docker-compose up -d
  1. Do a Select to the HeartDisease dataset:
age, bp, chestPainType, cholesterol, ekgResults, exerciseAngina, fbsOver120, heartDisease, maxHr, numberOfVesselsFluro, sex, slopeOfSt, stDepression, thallium
FROM dc_data_health.HeartDisease
  1. Do a Select to the Kidney Disease dataset:
age, al, ane, appet, ba, bgr, bp, bu, cad, classification, dm, hemo, htn, pc, pcc, pcv, pe, pot, rbc, rc, sc, sg, sod, su, wc
FROM dc_data_health.KidneyDisease
  1. Do a Select to the Diabetes dataset:
Outcome, age, bloodpressure, bmi, diabetespedigree, glucose, insulin, pregnancies, skinthickness
FROM dc_data_health.Diabetes
  1. Do a Select to the Breast Cancer dataset:
areamean, arease, areaworst, compactnessmean, compactnessse, compactnessworst, concavepointsmean, concavepointsse, concavepointsworst, concavitymean, concavityse, concavityworst, diagnosis, fractaldimensionmean, fractaldimensionse, fractaldimensionworst, perimetermean, perimeterse, perimeterworst, radiusmean, radiusse, radiusworst, smoothnessmean, smoothnessse, smoothnessworst, symmetrymean, symmetryse, symmetryworst, texturemean, texturese, textureworst
FROM dc_data_health.BreastCancer
  1. Do a Select to the Maternal Health Risk dataset:
BS, BodyTemp, DiastolicBP, HeartRate, RiskLevel, SystolicBP, age
FROM dc_data_health.MaternalHealthRisk
  1. Do a Select to the Hospital Mortality dataset:
age, aniongap, atrialfibrillation, basophils, bicarbote, bloodcalcium, bloodpotassium, bloodsodium, bmi, chdwithnomi, chloride, copd, creatinekise, creatinine, deficiencyanemias, depression, diabetes, diastolicbloodpressure, ef, gendera, glucose, "group", heartrate, hematocrit, hyperlipemia, hypertensive, inr, lacticaacid, leucocyte, lymphocyte, magnesiumion, mch, mchc, mcv, neutrophils, ntprobnp, outcome, pco2, ph, platelets, pt, rbc, rdw, relfailure, respiratoryrate, spo2, systolicbloodpressure, temperature, ureanitrogen, urineoutput
FROM dc_data_health.HospitalMortality
  1. Do a Select to the Life Expectancy dataset:
AdultMortality, Alcohol, BMI, Country, Diphtheria, GDP, HIVAIDS, HepatitisB, IncomeCompositionOfResources, InfantDeaths, LifeExpectancy, Measles, PercentageExpenditure, Polio, Population, Schooling, Status, Thinness1To19Years, Thinness5To9Years, TotalExpenditure, UnderFiveDeaths, Year
FROM dc_data_health.LifeExpectancy
  1. Do a Select to the Pollution Deaths dataset:
Country, CountryCode, DeathYear, ExcessMortality
FROM dc_data_health.PollutionDeaths
  1. Do a Select to the Dementia dataset:
ASF, Age, CDR, EDUC, Genre, Hand, MMSE, MRDelay, Outcome, SES, Visit, eTIV, nWBV
FROM dc_data_health.Dementia
  1. Do a Select to the Hepatitis Death risk dataset:
age, albumin, alkphosphate, anorexia, antivirals, ascites, bilirubin, fatigue, histology, liverbig, liverfirm, malaise, outcome, protime, sex, sgot, spiders, spleenpalpable, steroid, varices
FROM dc_data_health.Hepatitis

To install with ZPM

It’s packaged with ZPM so it could be installed as:

zpm "install dataset-health"

Dataset Licenses and sources/credits

  1. MIT License for this Application
  2. CC BY-NC-SA 4.0 License for the Breast Cancer Dataset
  3. CC0: Public Domain for Diabetes Dataset
  4. CC0: Public Domain for Heart Disease
  5. CC0: Public Domain for Maternal Health Risk
  6. CC0: Public Domain for World Life Expectancy
    • Original Source: https://www.kaggle.com/kumarajarshi/life-expectancy-who - The data was collected from WHO and United Nations website with the help of Deeksha Russell and Duan Wang.
    • File into the app: /opt/irisapp/data/life_expectancy.csv
    • Persistent Class: dc.data.health.LifeExpectancy
  7. CC0 1.0 Universal (CC0 1.0) Public Domain Dedication for Hospital Mortality
  8. CC0 1.0 Universal (CC0 1.0) Public Domain for Pollution Deaths dataset
  9. Attribution-NonCommercial-ShareAlike 3.0 IGO (CC BY-NC-SA 3.0 IGO) for Dementia dataset
  10. CC0 1.0 Universal (CC0 1.0) Public Domain for Hepatitis Death Risk dataset
  11. CC0: Public Domain for Kidney Disease
    • Original Source:
      • @misc{Dua:2019 ,
      • author = “Dua, Dheeru and Graff, Casey”,
      • year = “2017”,
      • title = “{UCI} Machine Learning Repository”,
      • url = “http://archive.ics.uci.edu/ml”,
      • institution = “University of California, Irvine, School of Information and Computer Sciences” }
    • File into the app: /opt/irisapp/data/kidney_disease.csv
    • Persistent Class: dc.data.health.KidneyDisease
Made with
zpm install dataset-health download archive
1.2.213 Jan, 2022
ObjectScript quality test
Technology Example
Works with
InterSystems IRISInterSystems IRIS for Health
First published
19 Dec, 2021
Last checked by moderator
27 Jun, 2023Works