MonetDB Evaluation 2022

I used a docker image for it: https://hub.docker.com/r/monetdb/monetdb

docker pull monetdb/monetdb

username/password (monetdb/monetdb)

Documentation: https://pymonetdb.readthedocs.io/en/latest/api.html

Some performance tips: https://www.monetdb.org/documentation-Jul2021/admin-guide/performance-tips/performance-tips/

Python Client pymonetdb: https://pymonetdb.readthedocs.io/en/latest/api.html

There is a blocker with respect to the transfer performance: https://github.com/gijzelaerr/pymonetdb/issues/94

Evaluating query plans / performance:

PLAN is the relational query plan
EXPLAIN is the SQL algebra [without costs]
TRACE: is doing timings but not per 'statement'

There is an embedding version does not need a server and stores the data on disk: https://www.monetdb.org/documentation-Jul2021/dev-guide/mbedded-python/

Activating Python 3 in the docker image

Install required packages

docker exec -it --user root monetdb /bin/bash
yum install MonetDB-python3 python3-numpy

Do the database settings

monetdb stop demo
monetdb set embedpy3=yes demo
monetdb start demo
monetdb get embedpy3 demo

Restrictions

VACUUM: we now disallow vacuum on system tables. The vacuum function isn't safe enough for these tables. A better vacuum solution for the system tables is needed. [..]

The memory is growing and growing: https://www.monetdb.org/documentation-Jul2021/admin-guide/system-resources/memory-footprint/ Bottom line: We need cgroups to limit the memory size, docker does it automatically and we need to avoid that the process gets killed by the OOM killer.

Kludge: prevent that the process is “oom”ed if it hits the memory limit:

echo -1000 > /proc/[monetdb_pid]/oom_adj

Snippets

Define a python function to calculate the hamming distance of need (query) and database rows (stings). Without an error, the data needs to be stored as a string, converted to numpy which is not efficient.

CREATE   FUNCTION  python_hamdist(strings  CLOB , needle  CLOB)
 RETURNS   INTEGER
  LANGUAGE  PYTHON {
       c = numpy.fromstring(needle, sep=" ", dtype=numpy.float32)
       qq = numpy.array([numpy.fromstring(q, sep=" ") for q in strings], dtype=numpy.float32)
       return numpy.abs(qq - c).sum(axis=1).tolist()

Bottom Line

There are no array data types we need for the sim DB. Plus, there are a lot of recent issues regarding performance issues with the python client or the python interface. And I was not able to find any reports that weren't written by the MonetDB guys. Therefore, the risk for a live MonetDB is big, since tests with smaller datasets already showed some limitations, like import speed and data transfer performance from the DB to the caller.

References:
https://github.com/MonetDB/MonetDB/issues/4048
https://stackoverflow.com/questions/65074614/monetdb-full-disk-how-to-manually-free-space
https://stackoverflow.com/questions/65079976/monetdb-set-specific-embedded-python-version
https://www.monetdb.org/documentation-Jul2021/dev-guide/mbedded-python/

Picalike Dokuwiki Archive

Table of Contents

MonetDB Evaluation 2022

Activating Python 3 in the docker image

Restrictions

Snippets

Bottom Line