Iris-Parquet
This is a tool to generate parquet files from IRIS data or load parquet data on IRIS data.
Description
The Iris-Parquet allows you:
- Generate parquet files from IRIS SQL instruction
- Generate JSON from Parquet file to allow you save it on IRIS SQL tables or JSON Documents
Prerequisites
- HADOOP_HOME configured to Hadoop folder
Installation with Docker
Clone/git pull the repo into any local directory
$ git clone https://github.com/yurimarx/iris-parquet.git
Open the terminal in this directory and call the command to build and run InterSystems IRIS in container:
Note: Users running containers on a Linux CLI, should use “docker compose” instead of “docker-compose”
See Install the Compose plugin
$ docker-compose build
$ docker-compose up -d
Installation with ZPM
USER> zpm install iris-parquet
Install hadoop files and set ENV variable to HADOOP_HOME:
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz && \
tar -xzf hadoop-3.3.6.tar.gz && \
echo "export HADOOP_HOME=/<unzipped folder>/hadoop-3.3.6"
Testing using Swagger-ui
- Go to http://:/swagger-ui/index.html
- On field Explore insert http://:/parquet-api/_spec
- For online sample it is:
Testing using Postman
- Open the file IRISParquet.postman_collection.json (or download from iris parquet postman)
- Set the variables server (iris webserver host) and port (iris webserver port) on Variables tab of the collection
- Run the method /generate-persons one or more to generate sample person fake data
- Run the method /sql2parquet with this query on body: select * from dc_irisparquet.SamplePerson
- Download the parquet file on the link Download file
- Run the method /parquet2json to the parquet file generated on the past step and the results
- You can also open the Parquet file on VSCode (install the parquet-viewer extension to see the parquet content from VSCode - https://marketplace.visualstudio.com/items?itemName=dvirtz.parquet-viewer)