Updates on the performance stats.
The capability to ingest thousands or millions of records per second while allowing for simultaneous queries in real time is required by many use cases across multiple industries, e.g. equity trade processing, fraud detection, IoT applications including anomaly detection and real time OEE, etc. Gartner calls this capability "HTAP" (Hybrid Transactional Analytical Processing). Others such as Forrester call it Translytics. InterSystems IRIS is a powerful, scalable, high performance and resource efficient transactional-analytic data platform that provides the performance of in-memory databases with the consistency, availability, reliability and lower costs of traditional databases.
This demo shows how InterSystems IRIS can ingest thousands of records per second while allowing for simultaneous queries on the data on the same cluster with very high performance for both ingestion and querying, and with low resource utilization. The demo works on a single InterSystems IRIS instance or on an InterSystems IRIS cluster on the cloud.
The same demo can be run on SAP HANA, MySQL, SqlServer and Amazon Aurora to compare performance and resource utilization in “apples-to-apples” comparisons.
You can run the tests on AWS! Here are some results:
You can run the tests on your own PC using Dockers (3 CPUs and 7GB of RAM)! Here are some results:
Follow this link to see instructions on how to run this Speed Test on AWS comparing InterSystems IRIS with other databases such as SAP HANA and AWS Aurora.
The pre-requisites for running the speed test on your PC are:
You can currently run this demo on your PC with InterSystems IRIS, MySQL, SqlServer and SAP HANA.
To run the demo on your PC, make sure you have Docker installed on your machine. You can quickly get it up and running with the following commands on your Mac or Linux PC:
wget https://raw.githubusercontent.com/intersystems-community/irisdemo-demo-htap/master/docker-compose.yml docker-compose up
If you are runing on Windows, download the docker-compose.yml file to a folder. Open a command prompt and change to that folder. Then run docker-compose up.
You can also clone this repository to your local machine to get the entire source code. You will need git installed and you would need to be on your git folder:
git clone https://github.com/intersystems-community/irisdemo-demo-htap cd irisdemo-demo-htap docker-compose up
Both techniques should work and should trigger the download of the images that compose this demo and it will soon start all the containers.
When starting, you will see lots of messages from all the containers that are staring. That is fine. Don't worry!
When it is done, it will just hang there, without returning control to you. That is fine too. Just leave this window open. If you CTRL+C on this window, docker compose will stop all the containers and stop the demo.
After all the containers have started, open a browser at http://localhost:10000 to see the demo UI.
Just click on the Run Test button to run the HTAP Demo! It will run for a maximum time of 300 seconds or until you manually stop it.
If you want to change the maximum time to run the test, click the Settings button at the top right of the UI. Change the maximum time to run the speed test to whatever you want.
After clicking on Run Test, it should immediately change to Starting.... If you are testing IRIS or SQL Server, it may stay on this status for a long time since we are pre-expanding the database to its full capacity before starting the test (something that we would normally do on any production system). IRIS is a hybrid database (In Memory performance with all the benefits of traditional databases). So IRIS still needs to have its disk database properly expanded. Just wait for it. For some databases, we could not find a way of doing this right from start (Aurora and MySQL) so what we did was to run the Speed Test once to "warm it up". Then we run it again (which causes the table to be truncated) with the database warmed up.
Warning: IRIS Database expansion can take some time. Fortunately, when running on your PC, we will pre-expand the database only to up to 9Gb since IRIS Community has a limit on the database size.
When the test finishes running, a green button will appear, allowing you to download the test results statistics as a CSV file.
When you are done testing, go back to that terminal and enter CTRL+C. You may also want to enter with the following commands to stop containers that may still be running and remove them:
docker-compose stop docker-compose rm
This is important, specially if you are going back and forth between running the speed test on one database (say InterSystems IRIS) and some other (say MySQL).
To run this demo against MySQL:
wget https://raw.githubusercontent.com/intersystems-community/irisdemo-demo-htap/master/docker-compose-mysql.yml docker-compose -f ./docker-compose-mysql.yml up
Now, we are downloading a different docker-compose yml file; one that has the mysql suffix on it. And we must use -f option with the docker-compose command to use this file. As before, leave this terminal window open and open a browser at http://localhost:10000.
When you are done running the demo, go back to this terminal and enter CTRL+C. You may also want to enter with the following commands to stop containers that may still be running and remove them:
docker-compose -f ./docker-compose-mysql.yml stop docker-compose -f ./docker-compose-mysql.yml rm
This is important, specially if you are going back and forth between running the speed test on one database (say InterSystems IRIS) and some other.
In our tests, we found InterSystems IRIS to ingest data 25X faster than MySQL and Amazon Aurora.
To run this demo against SQL Server:
wget https://raw.githubusercontent.com/intersystems-community/irisdemo-demo-htap/master/docker-compose-sqlserver.yml docker-compose -f ./docker-compose-sqlserver.yml up
As before, leave this terminal window open and open a browser at http://localhost:10000.
In our tests running on a local PC, we found InterSystems IRIS to ingest data 2.5X faster than SQL Server while query rates were 400X faster! We will test it against AWS RDS SQL Server and report.
To run the speed test with SAP HANA on your PC you will need:
To run this demo against SAP HANA:
git clone https://github.com/intersystems-community/irisdemo-demo-htap cd ./irisdemo-demo-htap ./run.sh hana
Wait for the images to download and for the containers to start. You will know when everything is up once docker-compose stops writing to the screen. But be patient - SAP HANA takes about 6 minutes to start! So, your screen will freeze for a minute or so and then you will see SAP HANA writing more text. It will repeat this for about 6 minutes.. Once you see the text "Startup finished!" you should be good to go. If it crashes with an error, it is probably because you need to give it more memory.
As you can see, it is not just a matter of running docker-compose up as it is with InterSystems IRIS and MySQL. SAP HANA requires some configurations to the Linux Kernel. The run.sh will do these configurations for you.
In our tests running the Speed Test on a VM, we found InterSystems IRIS to be 1.3X faster than SAP HANA for ingesting data, and 20X faster for querying data, and uses a fraction of the memory.
A video about this demo is in the works! In the meantime, here is an interesting article that talks about InterSystems IRIS architecture and what makes it faster.
The open source sysbench tool can certainly be extended but as of now it can only be used to test MySQL, PostgreSQL and other databases that are based on MySQL (ex.: AWS Aurora) or implement MySQL wire protocol. We could certainly modify it to test other databases but we wanted to use JDBC (not the C based driver) and we needed the tool to be less dependent of the backend database for metrics collection. Our tests are also simpler in the sense that we only have one single table. sysbench allows you to run INSERTS and SELECTS in parallel in multiple copies of the same table which is not fair for our use case. On financial services applications, data is coming in fast into a single table and how the data base deals with memory pressure and lock contention to avoid deadlocks is also important. Allowing the test to be run on multiple tables mitigates the lock contention problem and masks a serious problem.
The open-source Yahoo Cloud Serving Benchmark (YCSB) project aims to develop a framework and common set of workloads for evaluating the performance of different “key-value” and “cloud” serving stores.
Although there are workloads on YCSB that could be described as HTAP, YCSB doesn't necessarily rely on SQL to do it. This benchmark does.
TPC-H is focused on decision support systems (DSS) and that is not the use case we are exploring.
This benchmark is about ingestion rate versus query response time. We have a single table with many columns of different data types. We want to measure how fast a database can ingest the records while, at the same time, allowing for responsive queries.
This is not a simple problem. Many industries such as Financial Services and IoT have to ingest thousands of records per second. At very high ingestion rates, memory is consumed very quickly. Traditional Databases need to write to disk to keep ingesting while In Memory Databases will also be forced to constantly write to disk as well (change logs/journals and in some cases even part of the data that is in memory as in traditional databases). The question is: How InterSystems IRIS can be faster than an In Memory Database if InterSystems IRIS is writing to disk not only to its transaction log (like In Memory Databases) but also asynchronously keeping the database current?
It is all about efficiency. The ingestion workload will keep the database very busy. CPU and Memory will be working hard. Some In Memory databases will try to compress data in memory. Others will persist data to disk when the memory fills up. All this is happening while we are still trying to query the database in real time.
We want to show that In Memory Databases will not perform as well as InterSystems IRIS on certain workloads such as Equity Trading, High Ingestion throughput (IoT), etc. That is why we designed this test. It is meant to be much simpler than the general purpose tests out there:
InterSystems IRIS is a hybrid database. As with traditional databases, it will also try to keep data in memory. But as thousands of records por second are coming in fast due to the ingestion work, the memory is purged very fast. This test allows you to see how InterSystems IRIS is smart about its cache when compared to other traditional databases and In Memory databases. You will see that:
Here is the the statement we send to all databases we support:
CREATE TABLE SpeedTest.Account ( account_id VARCHAR(36) PRIMARY KEY, brokerageaccountnum VARCHAR(16), org VARCHAR(50), status VARCHAR(10), tradingflag VARCHAR(10), entityaccountnum VARCHAR(16), clientaccountnum VARCHAR(16), active_date DATETIME, topaccountnum VARCHAR(10), repteamno VARCHAR(8), repteamname VARCHAR(50), office_name VARCHAR(50), region VARCHAR(50), basecurr VARCHAR(50), createdby VARCHAR(50), createdts DATETIME, group_id VARCHAR(50), load_version_no BIGINT )
The Ingestion Worker will send as many INSERTs as possible as measure the number of records/sec inserted as well as the number of Megabytes/sec.
The Query Worker will SELECT from this table by account_id and try to select as many records as possible measuring it as records/sec selected as well as Megabytes/sec select to test the end-to-end performance and to provide proof of work.
End-to-end performance has to do with the fact that some JDBC drivers have optmizations. If you just execute the query, the JDBC driver may not fetch the record from the server until you actually request for a value of a column.
To proove that we are actually reading the columns we are SELECTing, we sum up the bytes of all the filds reeturned as proof of work.
To achieve maximum throughput, each ingestion worker will start multiple threads that each will:
The default number of ingestion worker threads is 15. But it can be changed during the test by clicking at the Settings button.
The query workers, on the other hand, also start multiple threads to query as many records as possible. But as we explained above, we are also providing proof of work. We are reading the columns returned and summing up the number of bytes read to make sure the data is actually traveling from the database, through the wire and into the query worker. That is to avoid optimizations implemented by some JDBC drivers that will only bring the data over the wire if it is actually used. We are actually consuming the data returned and providing a sum of MB read/s and total number of MB read as proof of it.
I filled up a 70Gb DATA file system after ingesting 171,421,000 records. That would mean that each records would take an avergage of 439 bytes (rounding up).
I also filled 100% of my first journal directory and about 59% of the second. Both filesystems had 100Gb which means that 171,421,000 would take about 159Gb of journal space or that each records would take an average of 996 bytes.
The architecture of the HTAP demo is shown below:
This demo uses docker compose to start five services:
When running the demo on our PCs, we use Docker and Docker Compose. Docker Compose expects a docker-compose.yml that describes these services and the docker images they use. This demo actually provides many docker-compose.yml files and more will be added soon:
Yes! The easiest way to get this done is to clone this repo on each server where you are planning on running the master and the UI (they run on the same server) and on each worker type (ingestion and query workers). You may have as many ingestion workers and query workers as you want!
Then, for InterSystems IRIS, look at the files on folder ./standalone_scripts/iris-jdbc. There is a script for every server:
What about InterSystems IRIS? You have two choices:
Just make sure you change your start_master.sh script to configure the environment variables with the correct InterSystems IRIS end points, usernames and passwords.
Look at the docker-compose.yml file and you will notice environment variables that will allow you to configure everything. The provided docker-compose yml files are just good starting points. You can copy them and change your copies to have more workers (it won't make a lot of difference if you are running on your PC), higher number of threads per worker type, change the ingestion batch size, wait time in milliseconds between queries on the consumter, etc.
Yes, but you will have to:
Changing the table structure should be simple.
After forking, you need to change the files on folder /image-master/projects/master/src/main/resources.
If you change the TABLE structure, make sure you use the same data types I am using on the existing table. Those are the data types supported. You can also change the name of the table.
Then, change the other *.sql scripts to match your changes. The INSERT script, the SELECT script, etc.
Finally, just run the build.sh to rebuild the demo and you should be ready to go!
After running a test, the UI will allow you to download the test results as a CSV file. Here is what the columns on the Results CSV file mean:
There are other InterSystems IRIS demo applications that touch different subjects such as NLP, ML, Integration with AWS services, Twitter services, performance benchmarks etc. Here are some of them:
Please, report any issues on the Issues section.
All the changes to this project are logged here.
Updates on the performance stats.
Updated documentation for AWS Aurora deployment.