
updated community article url
Let's say IRIS is contributing to workload for a Hospital system, marshalling DICOM, HL7, FHIR, or CCDA. Natively, IRIS persists these objects in various stages of the pipeline via the nature of the business processes and anything you included along the way.
Lets send that up to Google Big Query to augment and compliment the rest of our Data Warehouse data and ETL (Extract Transform Load) or ELT (Extract Load Transform) to our hearts desire.
This iris2bq
utility has got your back.
Exactly how again?
.avro
to keep the schema along the data: this will avoid to specify/create the BigQuery table schema beforehands..avro
into the respective BigQuery tables you specify.Note: internally, it is using the Spark framework for the sake of simplicity, but no Hadoop cluster is needed. It is configured as a "local" cluster by default, meaning the application and is running standalone.
iris2bq
synching those two places above ^^.OpenJDK1.8
worked for us.configuration.conf
for iris2bq to know where to grab and put the data:jdbc {
url = "jdbc:IRIS://127.0.0.1:51773/USER"
user = "_SYSTEM"
password = "flounder"
tables = [ "people" ]
}
gcloud {
project = "iris2bq-demo"
service-account-key-path = "service.key.json"
bq.dataset = "iris2bqdemods"
gcs.tmp-bucket = "iris2bqdemobucket"
}
Up there in the jdbc connection block is where the magic happens for us. Just specify a list of applicable tables in IRIS for the Namespace you are connecting to (USER
) and watch them appear in Big Query in our Dataset.
GOOGLE_CLOUD_PROJECT=iris2bq-demo GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account-key.json ./bin/iris2bq -Dconfig.file=configuration.conf
Add this to a scheduler (Airflow, crontab, etc) every 10min and enjoy your JOINs in BigQuery of your IRIS Data.
If you want to give this a shot on your own, here is a quick way to get up and running with InterSystems IRIS for Health. You will need Docker installed on your system.
Build Container:
$ cd demo
$ bash iris-docker.sh
Create a GCP project with Big Query
and Google Cloud Storage
API's enabled:
gcloud projects create iris2bq-demo--enable-cloud-apis
-or- try your hand as a Terraformer
$ cd demo
$ terraform init
$ terraform plan
$ terraform apply
Got a better idea? See a horrific bug? Grab sbt
for your system.
Build:
$ git clone https://github.com/basenube/iris2bq.git
$ cd iris2bq
// Develop, Develop, Develop
$ sbt stage
Thanks to our valued partners InterSystems and Google Cloud Platform for the feedback and support.
We dont pretend to be very good at Scala. Most of the original guts of this thing come out of the great work of these hackers at Powerspace. We have used a similar version of this they developed for Postgres to lift transactional data to a datawarehouse for a customer without issue for over a year now.