Initial Release
This repository contains four implemented synthetic demo datasets for InterSystems IRIS:
Each domain includes:
main_iris.py-> notationThe root ZPM package installs a shared ObjectScript entrypoint, SyntheticDataGen.DataLoader, and copies all domain assets into the IRIS install tree for lazy compilation and load.
Each dataset is synthetically generated and inserted into IRIS
Information on each individual dataset can be found in ./docs/datasets
Disclaimer - AI was extensively used for this project. Datasets were designed with advice from an LLM, and much of the code is AI generated. Whilst there has been human oversight, the code has not been carefully reviewed, and the datasets may not be realistic or perfect. These datasets are scaleable and designed for demos where having linked data tables which look realistic, is more important than quality or realism of the data.
Feedback or contributions are welcome
Install with InterSystems Package Manager with:
zpm "install iris-synthetic-data-gen"
Or clone the repo to build a local container:
git clone https://github.com/gabriel-ing/iris-synthetic-data-gen.git
cd iris-synthetic-data-gen
docker-compose up --build
The Synthetic data can be generated directly into IRIS tables, from a single command:
do ##class(SyntheticDataGen.DataLoader).Load("FinancialServices")
do ##class(SyntheticDataGen.DataLoader).Load("SupplyChain")
do ##class(SyntheticDataGen.DataLoader).Load("ThemePark")
do ##class(SyntheticDataGen.DataLoader).Load("Retail")
There are additional parameters for:
- Scale of dataset multiplier. 1 is default.
- Path to Config file. The datasets are configurable, with the default config being available in ./<dataset>/python/config/sample_config.yaml.
- Replace existing (boolean). 0 is default.
e.g. to overwrite an existing dataset with a new dataset which is 5 times bigger, you can run:
do ##class(SyntheticDataGen.DataLoader).Load("Retail", 5, "", 1)
If you want to remove the datasets, this is automated with the same classs:
do ##class(SyntheticDataGen.DataLoader).DeleteDataset("Retail")
module.xml: ZPM package definitionsrc/SyntheticDataGen/DataLoader.cls: shared IRIS loader and cleanup entrypointsrc/FinancialServices/: financial-services generator, classes, tests, and CSV loader scriptsrc/SupplyChain/: supply-chain generator, classes, and testssrc/Retail/: retail generator, classes, and testssrc/ThemePark/: theme-park generator, classes, and testsdocs/: repo-level architecture, troubleshooting, and integration notestests/test_zpm_packaging.py: end-to-end ZPM install and lazy-compile coverageIRIS package: Finance
Generated outputs:
accounts.csvcustomers.csvcards.csvmerchants.csvtransactions.csvdisputes.csvIRIS package: SupplyChain
Generated outputs:
sales_orders.csvpurchase_orders.csvdim_date.csvdim_product.csvdim_location.csvdim_supplier.csvdim_customer.csvproduct_supplier.csvsales_order_line.csvpurchase_order_line.csvshipment_line.csvinventory_movement.csvinventory_snapshot_daily.csvstock_count_event.csvIRIS package: Retail
Generated outputs:
customers.csvcalendar.csvroles.csvusers.csvuser_store_access.csvstores.csvproducts.csvsupplier_product.csvpromotions.csvpurchase_orders.csvstock_transfers.csvsales_transactions.csvinventory_snapshot.csvIRIS package: ThemePark
Generated outputs:
queue_snapshot.csvparks.csvzones.csvrides.csvride_maintenance.csvemployees.csvshifts.csvguests.csvtickets.csvincidents.csvfeedback.csvCreate a Python 3.10+ environment and install the shared requirements:
pip install -r requirements.txt
Root dependencies currently are:
numpypandaspyyamlpytestfakerEach domain exposes python -m DataGen.main --config ... and supports an optional --scale-factor.
Financial Services:
cd src/FinancialServices/python
python -m DataGen.main --config config/sample_config.yaml
python -m DataGen.main --config config/sample_config.yaml --scale-factor 2
Supply Chain:
cd src/SupplyChain/python
python -m DataGen.main --config config/sample_config.yaml
python -m DataGen.main --config config/sample_config.yaml --scale-factor 2
Retail:
cd src/Retail/python
python -m DataGen.main --config config/sample_config.yaml
python -m DataGen.main --config config/sample_config.yaml --scale-factor 2
Theme Park:
cd src/ThemePark/python
python -m DataGen.main --config config/sample_config.yaml
python -m DataGen.main --config config/sample_config.yaml --scale-factor 2
The output location is controlled by each domain config file.
Each domain also exposes main_iris.py for direct insert without writing CSVs first.
Financial Services:
cd src/FinancialServices/python
python -m DataGen.main_iris --config config/sample_config.yaml --package Finance --clear-existing
Supply Chain:
cd src/SupplyChain/python
python -m DataGen.main_iris --config config/sample_config.yaml --package SupplyChain --clear-existing
Retail:
cd src/Retail/python
python -m DataGen.main_iris --config config/sample_config.yaml --package Retail --clear-existing
Theme Park:
cd src/ThemePark/python
python -m DataGen.main_iris --config config/sample_config.yaml --package ThemePark --clear-existing
Current main_iris.py entrypoints support:
--config--package--clear-existing--commit-every--scale-factorFinancial Services also includes a CSV-to-IRIS utility at src/FinancialServices/python/scripts/load_csv_to_iris.py for DDL printing and CSV-based loads.
The root module currently behaves as follows:
SyntheticDataGen from module.xmlrequirements.txt and all domain asset trees into ${libdir}SyntheticDataGen/SyntheticDataGen.DataLoader during install^SyntheticDataGen("InstallRoot")Current DataLoader methods:
PersistInstallRoot()DefaultInstallRoot()SetInstallRoot(path)GetInstallRoot()EnsureDatasetClasses(dataset)DeleteDataset(dataset, deleteClasses=1)LoadData(dataset, scaleFactor="", configPath="", clearExisting=0)Valid dataset names:
FinancialServicesSupplyChainRetailThemeParkImportant LoadData() positional order:
From an IRIS session:
zpm "install SyntheticDataGen"do ##class(SyntheticDataGen.DataLoader).LoadData("FinancialServices") do ##class(SyntheticDataGen.DataLoader).LoadData("SupplyChain",2) do ##class(SyntheticDataGen.DataLoader).LoadData("Retail",2,"",1) do ##class(SyntheticDataGen.DataLoader).LoadData("ThemePark",2) do ##class(SyntheticDataGen.DataLoader).LoadData("FinancialServices","","/usr/irissys/lib/SyntheticDataGen/FinancialServices/python/config/sample_config.yaml",1)
do ##class(SyntheticDataGen.DataLoader).DeleteDataset("Retail") do ##class(SyntheticDataGen.DataLoader).DeleteDataset("SupplyChain",0) do ##class(SyntheticDataGen.DataLoader).DeleteDataset("ThemePark")
DeleteDataset(dataset) clears rows in child-to-parent order and, by default, removes the compiled dataset package. Pass 0 as the second argument if you want to keep the classes compiled.
Useful checks from an IRIS session:
write $get(^SyntheticDataGen("InstallRoot"))
write $classmethod("%Dictionary.ClassDefinition","%ExistsId","Finance.Customers")
write $classmethod("%Dictionary.ClassDefinition","%ExistsId","SupplyChain.DimCustomer")
write $classmethod("%Dictionary.ClassDefinition","%ExistsId","Retail.Stores")
write $classmethod("%Dictionary.ClassDefinition","%ExistsId","ThemePark.Parks")
On a fresh install, the domain classes should not exist until the corresponding domain is loaded or explicitly ensured.
Run all Python tests from the repo root:
pytest
Run the ZPM packaging and lazy-compile test:
pytest tests/test_zpm_packaging.py -q
By default that test rebuilds the Docker Compose IRIS container for a clean install check. Set SYNTHETICDATAGEN_REBUILD_DOCKER=0 to reuse the existing container for faster reruns.
Domain-specific tests can also be run from each domain directory with python -m pytest.
docs/Datasets/FINANCIAL_SERVICES_DATASET_GUIDE.md: detailed financial-services dataset semantics, values, and demo ideasdocs/Datasets/SUPPLY_CHAIN_DATASET_GUIDE.md: detailed supply-chain dataset semantics, values, and demo ideasdocs/Datasets/RETAIL_DATASET_GUIDE.md: detailed retail dataset semantics, values, and demo ideasdocs/Datasets/THEME_PARK_DATASET_GUIDE.md: detailed theme-park dataset semantics, values, and demo ideasdocs/SYNTHETIC_DATA_GEN_ARCHITECTURE.md: current repo architecture and load flowsdocs/IRIS_PYTHON_SQLERROR_246_TROUBLESHOOTING.md: embedded Python and IRIS SQL troubleshootingdocs/09.-Python-ObjectScript-Integration.md: current shared Python/ObjectScript install patternSUPPLY_CHAIN_SPECIFICATION.MD: current implemented supply-chain domain summarysrc/SupplyChain/python/README.md: supply-chain domain quick startsrc/Retail/python/README.md: retail domain quick startsrc/ThemePark/python/README.md: theme-park domain quick startDataGen in each domain, so DataLoader clears cached DataGen modules before importing a different domain in embedded Python.EnsureDatasetClasses() currently uses $system.OBJ.LoadDir(...), which works but reports a deprecation warning during validation.