====== MonetDB Evaluation 2022 ======

I used a docker image for it: https://hub.docker.com/r/monetdb/monetdb

<code>
docker pull monetdb/monetdb
</code>
username/password (monetdb/monetdb)

Documentation: https://pymonetdb.readthedocs.io/en/latest/api.html

Some performance tips: https://www.monetdb.org/documentation-Jul2021/admin-guide/performance-tips/performance-tips/

Python Client pymonetdb: https://pymonetdb.readthedocs.io/en/latest/api.html

There is a blocker with respect to the transfer performance: https://github.com/gijzelaerr/pymonetdb/issues/94

Evaluating query plans / performance:


  * PLAN is the relational query plan
  * EXPLAIN is the SQL algebra [without costs]
  * TRACE: is doing timings but not per 'statement'

There is an embedding version does not need a server and stores the data on disk: https://www.monetdb.org/documentation-Jul2021/dev-guide/mbedded-python/


===== Activating Python 3 in the docker image =====

Install required packages

<code>
docker exec -it --user root monetdb /bin/bash
yum install MonetDB-python3 python3-numpy
</code>
Do the database settings

<code>
monetdb stop demo
monetdb set embedpy3=yes demo
monetdb start demo
monetdb get embedpy3 demo
</code>

===== Restrictions =====


  * VACUUM: we now disallow vacuum on system tables. The vacuum function isn't safe enough for these tables. A better vacuum solution for the system tables is needed. [..]

The memory is growing and growing: https://www.monetdb.org/documentation-Jul2021/admin-guide/system-resources/memory-footprint/ Bottom line: We need cgroups to limit the memory size, docker does it automatically and we need to avoid that the process gets killed by the OOM killer.

Kludge: prevent that the process is “oom”ed if it hits the memory limit:

<code>
echo -1000 > /proc/[monetdb_pid]/oom_adj
</code>

===== Snippets =====

Define a python function to calculate the hamming distance of need (query) and database rows (stings). Without an error, the data needs to be stored as a string, converted to numpy which is not efficient.

<code>
CREATE   FUNCTION  python_hamdist(strings  CLOB , needle  CLOB)
 RETURNS   INTEGER
  LANGUAGE  PYTHON {
       c = numpy.fromstring(needle, sep=" ", dtype=numpy.float32)
       qq = numpy.array([numpy.fromstring(q, sep=" ") for q in strings], dtype=numpy.float32)
       return numpy.abs(qq - c).sum(axis=1).tolist()
</code>

===== Bottom Line =====

There are no array data types we need for the sim DB. Plus, there are a lot of recent issues regarding performance issues with the python client or the python interface. And I was not able to find any reports that weren't written by the MonetDB guys. Therefore, the risk for a live MonetDB is big, since tests with smaller datasets already showed some limitations, like import speed and data transfer performance from the DB to the caller.

References:\\
https://github.com/MonetDB/MonetDB/issues/4048\\
https://stackoverflow.com/questions/65074614/monetdb-full-disk-how-to-manually-free-space\\
https://stackoverflow.com/questions/65079976/monetdb-set-specific-embedded-python-version\\
https://www.monetdb.org/documentation-Jul2021/dev-guide/mbedded-python/\\