bizkapish

0005 MonetDB Benchmark

MonetDB / By Bizkapish / December 7, 2025 December 8, 2025

Benchmark Data

Machine Hardware

Creating Tables and Loading CSV Files

OrderRows Table

CREATE TABLE orderrows ( OrderKey BIGINT NOT NULL, LineNumber SMALLINT NOT NULL, ProductKey SMALLINT NOT NULL, Quantity SMALLINT NOT NULL, UnitPrice REAL NOT NULL, --DECIMAL(8,4)
NetPrice REAL NOT NULL, --DECIMAL(10,6) UnitCost REAL NOT NULL --DECIMAL(8,4) ); First, I will create table for OrderRows in the MonetDB. I will not create primary and foreign key constraints this time.

I will fill this table from my CSV file with "COPY INTO" statement.

COPY OFFSET 2 INTO orderrows FROM '/home/fff/Desktop/CSVs/orderrows.csv' USING DELIMITERS ',', E'\n', '"';

Other Tables

I will leave a file for download at the end of this article. That file will have SQL for creation and import of all of the Contoso tables.

Bellow we can see time for import for two other larger files. Smaller dimension tables are imported almost instantly.

Query Benchmarking

Cold Start

Aggregated Queries in MonetDB

It's hard to make a benchmark when execution times are constantly changing. So, from now on I will focus on the fastest times.

How Database Reports Execution Time

SELECT * FROM sales LIMIT 1000000; --13 ms
SELECT * FROM sales LIMIT 2000000; --24 ms MonetDB is reporting that the second query is slower than the first one. That is something that we are expecting. The problem is that according to my computer clock the first query was finished after 7 seconds, and the second one after 15 seconds.

Databases only report the time spent to produce the results in the memory. It will not include the time needed to print the result in the shell or any other client. That is why MonetDB is reporting 13 ms, but I can see the result only after 7 seconds. MonetDB has command to suppress printing of the result in the shell. I will use that command ( command explained here ) next, to test reading the whole tables.

I will disable my command, so we can again see the results of our queries.

Joins

SELECT productkey, SUM( quantity ), AVG( netprice )
FROM sales GROUP BY productkey; This query will execute for 340 ms. If we want to see brands then we have to make a join between "product" and "sales" tables.

ALTER TABLE product ADD CONSTRAINT product_pk PRIMARY KEY ( productkey ); ALTER TABLE sales ADD CONSTRAINT FKfromProduct FOREIGN KEY ( productkey )
REFERENCES product ( productkey ); Now that we have foreign key constraint,
the query from before will become 200 ms
faster. That is 20% faster.

DISTINCT, LIKE, ROLLUP

Window Functions

Updates

Problematic Queries

Double Grouping

Double grouping is when we first group our data, and then we group that result. For example, we will total sales quantity per customerkey, and then we will count customers per total quantity. We will count how many customers have the same total quantity.

This query is problematic because while the first grouping can be fast, the second one could be much longer. The result of the first grouping will have 2M rows, because we have so much customers. In the second stage, we have to group these 2M rows, and that is when I expect the performance to become bad.

SELECT customer.customerkey, SUM( quantity ) AS TotQty FROM customer INNER JOIN sales ON customer.customerkey = sales.customerkey GROUP BY customer.customerkey; In the first phase I will measure how
much time is needed to group by customer.
It is 5 seconds because there are 2 million customers.

Aggregated Query from Two Fact Tables ( Stitch Query )

This time I will create foreign key constraint on the "OrderRows" (211M) table. I want to aggregate sales and orderrows per product brend.
ALTER TABLE orderrows ADD CONSTRAINT FK_Product FOREIGN KEY ( productkey ) REFERENCES product ( productkey );

Query bellow will last 7.5 seconds. This is longer than I expected. If we ran subqueries separately the time would be just 900 ms each. Because we only have 15 brands, it is surprising that it will take 5 seconds just to join two small tables.
SELECT S.brand, Sq, Oq FROM

( SELECT Brand, SUM( quantity ) Sq FROM Product INNER JOIN Sales ON Product.Productkey = Sales.ProductKey GROUP BY Brand ) S

INNER JOIN

( SELECT Brand, SUM( quantity ) Oq FROM Product INNER JOIN Orderrows ON Product.Productkey = OrderRows.ProductKey GROUP BY Brand ) O

ON S.Brand = O.Brand;

If we "UNION ALL" our subqueries, the execution will last 7.5 seconds, too.

SELECT Brand, SUM( quantity ) Sq FROM Product INNER JOIN Sales ON Product.Productkey = Sales.ProductKey GROUP BY Brand 
UNION ALL 
SELECT Brand, SUM( quantity ) Oq FROM Product INNER JOIN Orderrows ON Product.Productkey = OrderRows.ProductKey GROUP BY Brand;

I have tried to read two small subqueries into python, and then to join them with pandas. Python reported execution time of just 1.2 seconds. It is strange that we can get the final result faster by combining MonetDB and Pandas, then just by using MonetDb.

WITH S AS ( SELECT productkey, SUM( quantity ) AS Sq FROM Sales GROUP BY productkey ),
O AS ( SELECT productkey, SUM( quantity ) AS Oq FROM Orderrows GROUP BY productkey ), PS AS ( SELECT brand, SUM( Sq ) AS SQty FROM Product INNER JOIN S ON Product.productkey = S.productkey GROUP BY brand ), PO AS ( SELECT brand, SUM( Oq ) AS OQty FROM Product INNER JOIN O ON Product.productkey = O.productkey GROUP BY brand ) SELECT PS.brand, Sqty, OQty FROM PS INNER JOIN PO ON PS.brand = PO.brand; We can reduce our fact tables by grouping them by productkey and then following the same logic. This approach would speed up our query to 5.5 seconds.

INSERT INTO SELECT

I will recreate foreign key constraint toward "product" table.
ALTER TABLE sales ADD CONSTRAINT FKfromProduct FOREIGN KEY ( productkey ) REFERENCES product ( productkey );

Conclusions

0005 MonetDB Benchmark Download

0000 MonetDB Introduction

MonetDB / By Bizkapish / November 29, 2025 December 10, 2025

Bundled Products vs Products in Bulk

Data in Bundle or in Bulk

When we organize data in rows, we use a row-oriented database. When we organize data in columns, we use a columnar database. Data is typically collected in a row-oriented database (Postgres, Oracle, MySQL) and then transferred to a columnar database for analytics ( ClickHouse, DuckDB, Vertica ).

Row Oriented Databases vs Columnar Databases

Row oriented databases are catching current state and providing transactional correctness. Columnar databases are for analysis of the historical data. We can also say that row-oriented databases are better in writing data and providing data consistency and integrity. Columnar databases are better in reading huge quantities of data.

OLTP vs OLAP

If we are preparing our database for everyday operations and transactions, then we are making OLTP database. For analytics and BI, we would prepare OLAP database. OLTP is compatible with row-oriented databases, and OLAP with columnar databases.

OLTP tables are filled with data by users and applications in small transactions. From time to time ( usually nightly ), data is transformed and transported into OLAP databases in batches. This makes data management different between OLTP and OLAP databases. OLTP databases receive data continuously, but OLAP databases receive data periodically and only after data is well-reformed.

OLTP is acronym for "Online Transaction Processing" and OLAP for "Online Analytical Processing". We saw that the difference is made by:
      – Different purpose and usage.
      – Different database technology.
      – Different data organization inside the database.
      – Different data management.

Data Immutability

Columnar ( OLAP ) databases are filled in batches. That doesn't mean that it is not possible to UPDATE or DELETE rows in tables. There are four strategies that OLAP databases use:

In recent years, MonetDB researchers have been gradually adding or experimenting with features that move it slightly toward the HTAP category:

– Better handling of frequent updates and small transactions
– Improved concurrency and snapshot isolation
– Faster incremental data loads
– Optimizations for mixed workloads

MonetDB

If you have one server and install MonetDB on it, you will be able to:
      – Enjoy blazing-fast query performance, even if you have a lot of data and your queries are more complex.
      – Avoid thinking about indexing, compression, statistics tuning, partitioning decisions. MonetDB will do it all automatically.
      – Apply the full power of SQL to all your tables, whether they are carefully modeled or just collected in one place.
      – Spend your money on something else because MonetDB is an open source database under the Mozilla Public License.

MonetDB is a fast, complete SQL, easy-to-maintain, open source analytical database server that can process huge amounts of data on a single server machine.

Hardware and Speed Considerations for MonetDB

MonetDB doesn't have a requirement to keep the whole database in a RAM memory. Parts of data that are already processed will be deleted from memory. MonetDB have capacity to deal with data that is much bigger than the amount of memory on your server machine. Only data that is currently processed will have just-in-memory execution pipelines.

MonetDB, will benefit from the fast NVMe disk, that has sustained performance under write load. Such disk should be accompanied with a lot of RAM memory, so that usage of a disk is minimal.

Zero-Tuning Philosophy

MonetDB will not burden you with indexing strategies, partitioning decisions and vacuuming. MonetDB tries to eliminate complexity. Most workloads run at full speed without manual optimization. Indexes are automatic. Compression is automatic. Memory management is automatic.

Creating indexes manually is problematic because you need to know in advance where to create the indexes. Indexes can slow down updates and deletes because when data is modified, the database must also update the corresponding indexes to reflect these changes. This is something that we especially want to avoid in the columnar databases. If users change their behavior, that can make our indexes ineffective.

Technique that MonetDB use to automatically tune indexes is called database cracking. Cracking means that MonetDB will sort and group data, and create indexes during SELECT queries. If most of the statements on the OLAP databases are SELECT queries, and such queries touch a lot of rows, then it is best to create indexes and tune data during the SELECT queries execution. This is optimal because:

   – The parts of a database that are heavily used will be tuned the best. We will not spend effort on data that no one reads.
   – Instead of heaving one huge indexing job, we will have many tiny, incremental reorganizations.
   – After some time, our data will be optimally indexed and sorted, but MonetDB can change strategy if users change their behavior.
   – Cracking is especially powerful in columnar databases like MonetDB, where each column is stored separately.

Cracking is powerful but not ideal for everything:

   – Heavy transactional workloads (many updates) disrupt adaptive patterns.
   – Distributed cracking across many nodes is still complex.
   – Highly unpredictable workloads (all queries unique) limit its benefits.

Comparison of MonetDB and Power BI

Power BI database ( SSAS Tabular ) is in-memory database, that is using compressed columnar tables, and is optimized for BI models ( dimensions, measures, heirarchies ). Power BI needs the whole model loaded in the memory, but columns are heavily compressed, so Power BI doesn't have huge memory footprint. For many calculations Power BI doesn't even need to decompress columns, so it can do its work while maintaining really fast IO.

Power BI likes table relationships and measures defined in advance, and is not optimized for arbitrary queries. It is not intended for complex joins outside the model. If we spend time creating optimal model, and we wait for our model to process/refresh during the load, we will get database that is optimized for:

   – measure evaluation
   – filters
   – slice and dice

On the other side MonetDB is optimized for raw analytical SQL on large tables, without modeling overhead.

MonetDB Imperfections

As columnar server, MonetDB suffers from all of the shortcomings that columnar servers have. MonetDB is not good in transactional traffic, and lot of deletes and updates. MonetDB is not perfect for reports that return huge tables with lot of columns. MonetDB is best for short, aggregated analytical queries.

If you have massive amounts of data, you will need a computer cluster, and cloud scalability. This is something where MonetDB doesn't shine. MonetDB is best if you have one powerful server machine, and you want to run complex analytics on it. Distributed databases are better in scaling, but because their setup is more complex and they are spread over many computers, they usually have less rich SQL capabilities and other limitations.

MonetDB does not support replication, but it does support distributed operation across multiple machines via sharding. Sharding will speed up some queries, but not all. If we want to set up a cluster with MonetDB, partitioning and sharding must be done carefully and manually. If possible, it is better to have a single powerful server than to resort to sharding and expensive network equipment.

MonetDB Market Position

You should use MonetDB if you want to:
   – Self-hosting an analytical database on your server.
   – You run complex ad-hoc queries, window functions, joins, and aggregations.
   – You don't want to design cubes or DAX, you just want fast SQL.
   – Avoid vendor lock by using fully open source software.

MonetDB is best for:
   – Data engineers
   – BI developers
   – Researchers
   – Small/medium businesses
   – Python users

MonetDB supports SQL, ODBC, JDBC and many programming languages ( python, java, R, ruby, PHP… ). It can be easily integrated with different software, but the support of 3^rd party software, clients and ORMs is limited. This is the consequence of the MonetDB origin which was research and science oriented. MonetDB was developed on the CWI institute in the Netherland. Development was driven by innovation and curiosity, and not by commercialization and marketing. This is why you can say that MonetDB is the fastest database you've never heard of. But that development lasted for 30 years and today MonetDB is complete and powerful database system.

MonetDB Features

Here, I will list features of the MonetDB server, and I will direct you toward blog posts about specific feature.

1) Blog posts 0010, 0020 and 0030 will teach you how to install MonetDB, how to create a database, and how to fill database with a sample data. On the other side, installation of MonetDB through docker is explained in the last blog post 0600.

2) MonetDB can be augmented by using python language. There are different ways how that can be achieved. Blog post 0040 will explain how to connect to MonetDB from python.

3) Blog post 0050 is about identifiers and constants. MonetDB supports different data types.

Strings	0060	Time	0060, 0070	JSON	0090	URL ( https://www.google.com )	0100
Numbers	0060	Autonumber	0080	UUID ( universally unique identifier )	0090	Network ( IP addresses )	0100

4) How to create and alter a table, and set table and column constraints is explained in blog posts 0110 and 0320.

CREATE TABLE CREATE TABLE AS SELECT ALTER TABLE Constraints: PK,FK,Unique,NULL,read only,DEFAULT CHECK constraint is explained in 0520

MonetDB also supports some special kinds of tables.
   – Temporary tables are tables with limited lifetime. They are described in 0370.
   – If we have several small tables with the same structure, we can connect them into one big virtual table ( partitioning ). This is covered in 0380.
   – Unlogged tables are special tables that can be used for fast writes, updates and deletes. We can read about them in 0390.
   – Views are explained in 0300.
   – MERGE and REMOTE tables are special tables that are used for sharding in MonetDB. Sharding is a way to distribute MonetDB work on a cluster of    computers. We can read about them in 0480.

5) SELECT, WHERE, HAVING, LIMIT;OFFSET, GROUP BY, ORDER BY, are described in 0120, 0150.

JOINS

0130

UNION, INTERSECT, EXCEPT

0140

INSERT, INSERT INTO, UPDATE, DELETE

0150

Subqueries

0160

6) Article 0170 will teach as about ANALYZE, SAMPLE and PREPARE.
   – ANALYZE is used to update MonetDB statistics. That statistic is used by MonetDB to optimize queries.
   – SAMPLE is used to create a sample of rows from some table.
   – PREPARE is a way to prefabricate our statement so that when we execute it, it is running faster.

7) MERGE statement, CTE and GROUPING SETS can be considered as composite SQL statements, that do several things at once.

MERGE is used to partially synchronize two tables. 0180 CTE is a construction that will break long SQL statements into many smaller ones. Recursive CTEs are supported 0510. 0190 GROUPING SETS will help us to hierarchically group our data. 0490

8) In SQL, running totals and moving averages are calculated by using WINDOW functions. Theory and syntax of window functions are presented in 0200, 0220. Window function can be divided into Aggregate 0230, Ranking 0240 and Offset 0250 window functions.

9) MonetDB is rich with built-in functions. They can be divided into Aggregate 0210, Mathematical 0260, String 0270, Comparison 0280, functions. Time functions are explained in 0070.

10) Transactions are explained in 0290, Indexes in 0300, Schemas in 0310,

Transactions are a way to protect consistency and integrity of our data.

Indexes are usually made by MonetDB automatically. Still, we can create some special types of indexes by our selves.

Schemas are logical parts of our database.

11) Blog posts 0330, 0340, 0350 are about importing and exporting data to/from MonetDB. We can import/export data to/from CSV files or a binary format. It is also possible to load data into MonetDB from different programming languages ( we have covered python and SQL ).

12) Procedural SQL is a way to introduce procedural paradigm into SQL. Procedural paradigm is when we have to explain to database what to do instead of just describing our demands with SQL query. With procedural SQL we can make:

   – Custom functions ( 0400, 0410, 0420 ).
   – Procedures ( 0440 ). Procedures are sets of statements that are executed together.
   – Triggers ( 0450 ). Triggers are procedures that run automatically after some event.

CREATE FUNCTION CREATE PROCEDURE CREATE TRIGGER RETURN DECLARE TABLE DECLARE VARIABLE CASE WHILE IF

13) Creating users and setting their rights is explained in blog posts 0460, 0470.

CREATE USER ALTER USER GRANT REVOKE CREATE ROLE SET ROLE

14) Articles 0500, 0510 will teach us how to create ODBC and JDBC connections to MonetDB. We will also learn about file_loader and proto_loader functions. With the file_loader and proto_loader functions, we can treat CSV files and tables in other ODBC databases as if they were local MonetDB tables. Article 0530 will show us how to encrypt connection with MonetDB using TLS encryption.

15) Every database need backup. That is explained in 0540 and 0550.

16) For administration we have to learn three console applications:

   – Monetdbd ( 0560 ) is a linux daemon. This is the main application used to start and control MonetDB.
   – Monetdb ( 0570 ) is console application used to work with specific database.
   – Mclient ( 0580 ) is client application that we can use to send queries to MonetDB.

17) Different system and session procedures and commands will help us to monitor and control user sessions and queries. They are explained in 0590.

0010 Install MonetDB Server on Ubuntu Linux

MonetDB / By Bizkapish / October 29, 2025 December 4, 2025

Notice: In this blog post, we will create a VOC database. This database will be used as the main database for most subsequent blog posts.

Getting the Codename of our Ubuntu Version

First, we need to know the code name of our Ubuntu version. We can find that by reading from the file "os-release". From this file we can read only the line that has words "VERSION_CODENAME" inside of it.

cat /etc/os-release | grep VERSION_CODENAME

Our Ubuntu codename is "focal". It is also possible to use command:

lsb_release -cs

We can see from the command line above that our user account is "fffovde". "FffOvdeKomp" is the name of our computer.

Adding a Repository Where MonetDB is Stored

Inside of this file we must place this text. These are addresses to MonetdDB repository. deb https://dev.monetdb.org/downloads/deb/ focal monetdb deb-src https://dev.monetdb.org/downloads/deb/ focal monetdb

We can add this text by running these two lines in our terminal:

sudo sh -c 'echo "deb https://dev.monetdb.org/downloads/deb/ focal monetdb" >> monetdb.list'
sudo sh -c 'echo "deb-src https://dev.monetdb.org/downloads/deb/ focal monetdb" >> monetdb.list'

Now our file looks like this:

Installing GPG key

Then, we would execute this command. This command will read GPG key file from the internet, and it will place that file in location "/etc/apt/trusted.gpg.d/monetdb.gpg". GPG key is a file which will be used to verify MonetDB packages before installing them.

sudo wget --output-document=/etc/apt/trusted.gpg.d/monetdb.gpg https://dev.monetdb.org/downloads/MonetDB-GPG-KEY.gpg

MonetDB Installation

Now we can install MonetDB. First, we will update our list of available software with the command: sudo apt update

Then we can install MonetDB server and client:
sudo apt install monetdb5-sql monetdb-client

Creating a DBfarm

`monetdbd create /home/fffovde/DBfarm1`	We will use "`monetdbd`" to create a DBfarm on our disk.
	Inside of this folder, a new file "`.merovingian_properties`" will appear. `ls -A /home/fffovde/DBfarm1`

The Merovingian dynasty was the ruling family of the Franks from the mid-5th century until 751. This dynasty ruled the Netherlands, the country from which MonetDB originates. MonetDB is using this term for some of its internal files and commands.

Now, that DBfarm is created, I will start the daemon. We can use daemon to control DBfarm.	`monetdbd start /home/fffovde/DBfarm1`
	Inside of the DBfarm1, we now have 4 files: `ls -A /home/fffovde/DBfarm1`

Creation of a Database

While monetdbd is used to manage DBfarm, "monetdb" console application is used to manage individual databases. In the background, "monetdb" will send our commands to monetdbd, and monetdbd will be the one exercising direct control over databases. So, we control databases from "monetdb", but through the power of monetdbd.

When database is created, it will only have one default user. That user is administrator "monetdb", and he has default password "monetdb". Database will be created in the maintenance mode. That means that only administrator will be able to start the database, and only on the local computer.

Administrator should create the database with this command. Then he should log into database, and he should change his default password to some other secret and complex password. We will not do that this time, we will continue using the default password "monetdb".

The process with our database server is called "mserver5". This process will run when the database is opened:
pgrep mserver5 This command will return process ID of our database server process.

Logging into Database

Now that our database is working and is listening on port 50000, we can try to use it. We will now use another application, with the name "mclient". Let's first recapitulate three console applications used by MonetDB:

We can exit the "mclient" program by typing the word "quit".

How to Stop our Server?

This is how we can stop "`mserver5`" process, the process of the Monetdb database. `monetdb stop voc`
	Log file will tell us what happened. `tail -1 /home/fffovde/DBfarm1/merovingian.log`

We can put our database in maintenance mode at any time. It doesn't matter if database is opened or closed. We use "lock" command.
monetdb lock voc

Next time, only administrator, on the local computer, can start the database with "monetdb start voc" command. He can start the database in exclusive mode, so that he can run some maintenance operations on the database ( he can do backup, or he can make changes in the schema ). We saw previously that database can be taken out of maintenance mode with "monetdb release voc" command. After that, any user can login to a database.

We can also stop "`monetdbd`" daemon.	`monetdbd stop /home/fffovde/DBfarm1`
	Log file will show us that daemon has stopped.

Install MonetDB in Alternative Way

This time I will jump to the newer version of the Ubuntu. It is "noble".
cat /etc/os-release | grep VERSION_CODENAME

We will install this "monetdb-repo.deb" package.
sudo apt install /home/fff/Downloads/monetdb-repo.deb

sudo apt update sudo apt install monetdb-sql monetdb-client We can now continue installing MonetDB in the standard way.
The process is the same as above.

Uninstalling Of the MonetDB

monetdbd create /home/fff/DBfarm1 monetdbd start /home/fff/DBfarm1 I will uninstall MonetDB on the "noble" server.
We will first create and start a DBfarm, because I want to show you the whole process.

For uninstallation process, if monetdbd is running, we must stop it. monetdbd stop /home/fff/DBfarm1

We still have "monetdb" group and user.
cat /etc/passwd | grep monetdb
getent group monetdb

This will not delete DBfarms, only MonetDB application.

0030 Systemd Unit File, MonetDB Sample Database

MonetDB / By Bizkapish / October 29, 2025 October 31, 2025

Systemd Unit File

Systemd Unit File is a file with settings that Systemd will use when starting and controlling some daemon. From version 11.53.13, MonetDB is using a new systemd file that allows us to change the default directory ( DBfarm ) for databases that are controlled by Systemd.

Old Systemd Unit File

New Systemd Unit File

From version 11.53.13, we have a new system file that allows us to change DBfarm folder, controlled by Systemd.

systemctl status monetdbd
This command will show us where is systemd unit file for Monetdbd daemon.
/usr/lib/systemd/system/monetdbd.service

Changing The Default Directory

I am using linux distribution "KDE Neon". This distribution doesn't use "SELinux". "SELInux" is security feature of the "Red Hat" distribution. The original Systemd unit file provided by the MonetDB developer team is using the command "chcon" that is only useful for the distributions that are using SELinux. I will exclude this code from my Systemd unit file.

ExecStartPre=/bin/bash -c 'test -d ${DBFARM} || (mkdir -m 770 ${DBFARM}; chcon -u system_u -r object_r -t mserver5_db_t ${DBFARM} )'

ExecStartPre=/bin/bash -c 'test -f ${DBFARM}/.merovingian_properties || (umask 0007; /usr/bin/monetdbd-11.53.13 create ${DBFARM}; /usr/bin/monetdbd-11.53.13 set pidfile=/run/monetdb/merovingian.pid ${DBFARM}; touch ${DBFARM}/.merovingian_lock; chcon -u system_u -r object_r -t monetdbd_lock_t ${DBFARM}/.merovingian_lock; chcon -u system_u -r object_r -t monetdbd_etc_t ${DBFARM}/.merovingian_properties)'

Extraordinary, because I am using distro without SELinux, I will also add these corrected lines to my "drop-in" Systemd unit file.

# First, we will delete all of the "ExecStartPre=".
ExecStartPre=
# This line will add the first part of the "ExecStartPre=". The line "ExecStartPre" is split into three parts,
# just to make it more readable. These three lines can be considered as one script.
ExecStartPre=/bin/bash -c 'test -d ${DBFARM} || (mkdir -m 770 ${DBFARM})'
# This line will add the second part of the "ExecStartPre=".

ExecStartPre=/bin/bash -c 'test -f ${DBFARM}/.merovingian_properties || (umask 0007; /usr/bin/monetdbd-11.53.13 create ${DBFARM}; /usr/bin/monetdbd-11.53.13 set pidfile=/run/monetdb/merovingian.pid ${DBFARM}; touch ${DBFARM}/.merovingian_lock)'

# Third line is unchanged.
ExecStartPre=/usr/bin/grep -q pidfile=/run/monetdb/merovingian.pid ${DBFARM}/.merovingian_properties

The Changes We Made

This command will show us the new value of the DBFARM environ:
systemctl show monetdbd | grep ^Environment=

systemctl cat monetdbd This command is also useful because it will show us the original file, and it overrides, and their locations, all together.

Testing The Changes

Because we are going to use Systemd, we should add our user to "monetdb" group.

`sudo usermod -aG monetdb "$USER"`	After that, we should log out and then log in.
`newgrp monetdb`	Sometimes log out/in will not be enough. In that case, try to open a new session ( a tab ) in a terminal, or run this command in the existing session. This is happening because OS will try to recycle the old session, so changes are not applied.

We will now create one database and after that we will restart our computer to see if it will be started automatically.

systemctl start monetdbd monetdb create DBdesktop monetdb release DBdesktop systemctl enable monetdbd reboot -- Start monetdbd daemon controlled by systemd.
-- Create a database.
-- Make database available.
-- Make monetdbd to start automatically after computer reboot.
-- Restart our computer.

What is Schema

Schema as a File

We can create a file that will contain all the instructions the server needs to create database objects inside of the schema. This file would tell the database which tables, relations, indexes, views to create. In this way, we can create everything that makes up one schema. That is why we call such a file a "schema file". Although we have not learned all the SQL commands needed to create such a file, we can use the "schema file" that the MonetDB development team has prepared for us.

Creation of a New User

For our schema we will create a new user. First, we will enter mclient with 'monetdb' privileges.
> monetdbd start /home/fffovde/DBfarm1 > mclient -u monetdb -d voc --password monetdb

For creation of a user, we need username (USER), password (WITH PASSWORD), user's full name (NAME), and default schema for that user (SCHEMA). The default schema is schema that MonetDB will use as a current schema when the user log in. For tables in the current schema, the user can type "SELECT * FROM Table1", but for tables in NON current schemas, the user must type "SELECT * FROM schema.Table1".

"sys" schema is a built-in schema in MonetDB. We will use it temporarily so we can create a new user.
sql> CREATE USER "voc" WITH PASSWORD 'voc' NAME 'VOC Explorer' SCHEMA "sys";

As a 'monetdb' administrator we can create a new schema. We will say that previously created "voc" user is the owner of that schema.

sql> CREATE SCHEMA "voc" AUTHORIZATION "voc";

Don't get confused. The name of our database is "voc", but the name of the new schema is also "voc", and the name of the user is "voc".

We will set the new schema as the default schema for our user.
sql> ALTER USER "voc" SET SCHEMA "voc"; sql> \q -- we can exit mclient with "quit" or "\q"

Since the "voc" schema is the default schema for the "voc" user, this schema will be active when this user logs in to MonetDB. Everything the user does will be reflected in this schema, unless the user explicitly mentions that they want to work in a different schema.

Populating our Schema with Database Objects

Our "voc" schema is currently empty, but we have definitions of all the tables, view, indices … inside of our downloaded SQL script. We will use that script to populate our schema. We type:

DBeaver Database Manager Program

We will install DBeaver database manager program to peruse our database.
> sudo snap install dbeaver-ce

At the end, objects of our database will appear inside of pane on the left side of a program. There, we should double click on schema name (1). After that we can select tab "ER Diagram" (2). There, after some rearrangement, we will see ER diagram of our database (3). As we can see, tables are organized in star schema with "voyages" table as the only fact table. All tables are connected with foreign key constraints, where foreign key is also primary key inside of dimension tables. The only exception is Invoice table where foreign key columns are not primary key columns, and that is why that relationship is shown with dashed line (4).

Here is download for voc_dump.zip file used in this blog post:

Download

DBeaver is an excellent program. If you want to learn more about it, you can try this linkedin tutorial.

https://www.linkedin.com/learning/dbeaver-essential-training?trk=learning-topics_trending-courses_related-content-card&upsellOrderOrigin=default_guest_learning

0600 Running MonetDB in Docker Container

MonetDB / By Bizkapish / October 18, 2025 October 21, 2025

Introduction

Virtual Machines

Container

Container is something in between portable application and virtual machine.

Portable application	Container	Virtual machine
– Application that doesn't need installation. – It should match OS version. – Lightest and fastest. – Not isolated from other applications.	– Application with most of its dependencies. – It should match class of OS (Linux, Windows…). – Middle weight. – Mostly isolated.	– Full OS with all the applications. – Should match hypervisor. – Heavy weight. – Fully isolated.

Containers are similar to portable applications, they can be easily deployed, moved, and upgraded. In addition, containers are isolated from other applications and will not compete for resources between them. Containers are much lighter and faster than virtual machines. Containers are immutable, so it is not possible to update parts of the container. This gives us consistency that we can rely on.

Usage of a Container

Images are templates used to create containers. An image contains everything needed to build a single container. Images are created by application developers. A developer would create an image and then provide it to users, who would then be able to create containers based on the image.

Let's say we want to publish our application as an image. We will package all our source code along with a "Dockerfile" (explained later) into a single folder. We would use the "Build" command to create a "Docker image". The image is the template from which containers are built. We will upload that image to a Docker image repository (registry). "Docker hub" is a well-known cloud image registry. Users can download our image from the registry. Based on that image, the user will launch the container. In the container, the user would find the application we created.

Anatomy of a Dockerfile

If we go to this git hub web page "link", we will find an example of files that are needed to create an image for MonetDB.

All Dockerfiles are made like this. They usually have the same steps:	1) get linux image for API	2) define a folder	3) copy your application	4) download dependencies
5) build/compile	6) create user	7) expose network ports	8) set some environs	9) define startup command

Docker file is basically a list of steps we would take on a physical machine to make our application operational. We will not go any deeper into docker technology. In this article we will only learn how to install docker and download and use MonetDB container.

Installation of a Docker

On this web page we can read how to install Docker on Ubuntu. It is a long page, but we can take a shortcut.
https://docs.docker.com/engine/install/ubuntu

We should add our user to docker group.
sudo usermod -aG docker $USER After that we should log out and log in.

Monetdb Container Installation

Getting a Monetdb Image

We type this command to observe our images. In non-compressed state, our image is 536 MB.
docker images

"Tag" is a version of the image. If we don't provide a tag, we will always get the "latest" version.

Starting MonetDB Container

The server will stay in the foreground in the terminal. We will open a new tab inside terminal and there we will type:
docker ps #this will list opened containers

We can connect to bash in the new container. "28629" are start figures of the container ID. We will be connected as a root.
docker exec -it 28629 bash

What Will Happen If We Reboot Computer

After restart "docker ps" command will show containers active on our computer. We can see that our container STATUS is "Up 43 seconds".

If we type "docker ps -a" we would get all the containers on our computer, both stopped and active.

MonetDB Container Options

Above we use the "-d" option to detach our container. This means that after we run this command, docker will not remain in the foreground, but we will get the command line again, so we can continue typing commands in the same shell.

At the bottom of the image, we can see ID of the new container.

1666729293f….

MDB_FARM_DIR and MDB_CREATE_DBS

We can define in which directory in the container our farm will be created. That is defined with MDB_FARMDIR. With MDB_CREATE-DBS, we can create one or several databases. Between the names of databases there must be no spaces, just the comma.

-e MDB_CREATE_DBS=db1,db2 -e DB_FARM_DIR=/var/monetdb5/dbfarm2 I will again enter container's bash to
check whether we have this folder and databases. docker exec -it 166 bash

We don't have explicitly define MDB_FARM_DIR and MDB_CREATE_DBS. They can take the default values. /var/monetdb5/dbfarm monetdb

Volume

If we want the data to be safe and persistent, we should use volumes.

MDB_SNAPSHOT_DIR, MDB_SNAPSHOT_COMPRESSION and MDB_LOGFILE

By default, logfile will be placed inside of the DBfarm, and will have a name "merovingian.log".

We have set snapshot options:
-e MDB_SNAPSHOT_DIR=/snapshot -e MDB_SNAPSHOT_COMPRESSION=.tar.bz2 I will make a snapshot with the default settings:

monetdb snapshot create db1

Snapshots cannot be created if we don't provide snapshots environs. It would be wise to place snapshots inside of the volume to preserve them.

MDB_FARM_PROPERTIES and MDB_DB_PROPERTIES

With MDB_FARM_PROPERTIES environs we set two DBfarm properties.
monetdbd get all /var/monetdb5/dbfarm2 | egrep 'exittimeout|keepalive'

Possible properties are explained inside of this blog post "link". We cannot set properties "listenaddr, control and passphrase". "Listenaddr" is always set to "all". "Passphrase" is set by the environ MDB_DAEMON_PASS. If we set MDB_DAEMON_PASS, then "control" will be set to true.

Possible properties for a database are explained in this blog post "link".

MDB_DB_ADMIN_PASS and MDB_DAEMON_PASS

MDB_DB_ADMIN_PASS is a mandatory environ. We must set it in order to run MonetDB container.

MDB_DAEMON_PASS is a password for remote Monetdbd control. Remote control of a daemon is explained in the blog post "link". In the context of the containers, this password will allow us to control the daemon in the container from the host operating system.

I will run this command from the host OS to control monetdbd daemon inside of the container.
monetdb -h localhost -P daemonPass create db3

For this to work we need to have some conditions fulfilled on the host operational system:
– We must have MonetDB installed on the host computer.
– DBfarm must be started. Without that we cannot use "monetdb" command.
– DBfarm on the host computer must use port number different than the port number of the MonetDB inside of the container.

monetdbd get port /home/clean/DBfarm1

Host computer has Monetdbd on the port 50.0001.

How to Stop and Start Container?

Clean Up

Deleting Images and Containers

Removing Docker from Your System

Deleting Docker includes these steps:

We can now uninstall all parts of the docker:
sudo apt purge -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

In the next step, we will remove docker repository and GPG key.
sudo rm -f /etc/apt/sources.list.d/docker.list sudo rm -f /etc/apt/keyrings/docker.gpg

We can update the list of available packages.
sudo apt update

We will delete docker user group.
sudo groupdel docker

Now, we can conclude that docker is deleted:
docker --version