====== MonetDB Evaluation 2022 ====== I used a docker image for it: https://hub.docker.com/r/monetdb/monetdb docker pull monetdb/monetdb username/password (monetdb/monetdb) Documentation: https://pymonetdb.readthedocs.io/en/latest/api.html Some performance tips: https://www.monetdb.org/documentation-Jul2021/admin-guide/performance-tips/performance-tips/ Python Client pymonetdb: https://pymonetdb.readthedocs.io/en/latest/api.html There is a blocker with respect to the transfer performance: https://github.com/gijzelaerr/pymonetdb/issues/94 Evaluating query plans / performance: * PLAN is the relational query plan * EXPLAIN is the SQL algebra [without costs] * TRACE: is doing timings but not per 'statement' There is an embedding version does not need a server and stores the data on disk: https://www.monetdb.org/documentation-Jul2021/dev-guide/mbedded-python/ ===== Activating Python 3 in the docker image ===== Install required packages docker exec -it --user root monetdb /bin/bash yum install MonetDB-python3 python3-numpy Do the database settings monetdb stop demo monetdb set embedpy3=yes demo monetdb start demo monetdb get embedpy3 demo ===== Restrictions ===== * VACUUM: we now disallow vacuum on system tables. The vacuum function isn't safe enough for these tables. A better vacuum solution for the system tables is needed. [..] The memory is growing and growing: https://www.monetdb.org/documentation-Jul2021/admin-guide/system-resources/memory-footprint/ Bottom line: We need cgroups to limit the memory size, docker does it automatically and we need to avoid that the process gets killed by the OOM killer. Kludge: prevent that the process is “oom”ed if it hits the memory limit: echo -1000 > /proc/[monetdb_pid]/oom_adj ===== Snippets ===== Define a python function to calculate the hamming distance of need (query) and database rows (stings). Without an error, the data needs to be stored as a string, converted to numpy which is not efficient. CREATE FUNCTION python_hamdist(strings CLOB , needle CLOB) RETURNS INTEGER LANGUAGE PYTHON { c = numpy.fromstring(needle, sep=" ", dtype=numpy.float32) qq = numpy.array([numpy.fromstring(q, sep=" ") for q in strings], dtype=numpy.float32) return numpy.abs(qq - c).sum(axis=1).tolist() ===== Bottom Line ===== There are no array data types we need for the sim DB. Plus, there are a lot of recent issues regarding performance issues with the python client or the python interface. And I was not able to find any reports that weren't written by the MonetDB guys. Therefore, the risk for a live MonetDB is big, since tests with smaller datasets already showed some limitations, like import speed and data transfer performance from the DB to the caller. References:\\ https://github.com/MonetDB/MonetDB/issues/4048\\ https://stackoverflow.com/questions/65074614/monetdb-full-disk-how-to-manually-free-space\\ https://stackoverflow.com/questions/65079976/monetdb-set-specific-embedded-python-version\\ https://www.monetdb.org/documentation-Jul2021/dev-guide/mbedded-python/\\