Table of Contents
Some of my random thoughts and insights with sqlite and python
getting started:
from sqlite3 import Connection db = Connection("/tmp/test.db") cur = db.cursor() cur.execute("CREATE TABLE test(col1 int, col2 text)") db.commit()
Einstellungen per PRAGMA (https://www.sqlite.org/pragma.html):
cur.execute("PRAGMA journal_mode=WAL") # probably only useful at the beginning of the programm
cursor.executemany:
data = [(1, "a"), (2, "b")] cur.executemany("INSERT INTO test (col1, col2) VALUES (?, ?)", data) db.commit() # <-- this seems to influence performance a lot (in a positive way, during feed updates)! # http://codereview.stackexchange.com/questions/26822/myth-busting-sqlite3-performance-w-pysqlite#comment41680_26822
Journal Modes:
- OFF: Änderungen direkt in die Datenbank
- DELETE: Default: nach einer Transaktion wird das Journal gelöscht
- WAL: Write Ahead Log: Erlaubt mehrere Leser und Schreiber
- MEMORY: Journal wird im RAM gehalten
Ein Test hat ergeben, dass journal_mode=WAL für die LillyX am besten geeignet ist.
python + sqlite3 mit der aktuellsten sqlite3-Version
- extract sources: tar xfvz pysqlite-2.8.3.tar.gz
- cd pysqlite-2.8.3
- Download most recent sqlite version from https://www.sqlite.org/download.html: wget https://www.sqlite.org/2017/sqlite-amalgamation-3160200.zip
- unzip sqlite-amalgamation-3160200.zip
- Die sqlite sources in das gleiche Verzeichnis wie die setup.py bewegen: mv sqlite-amalgamation-3160200/* .
- sqlite bauen: python setup.py build_static
- pip install –user .
- cp build/lib.linux-x86_64-2.7/pysqlite2/_sqlite.so ~/.local/lib/python2.7/site_packages/pysqlite2/
Prüfen, ob es funktioniert hat:
from pysqlite2 import dbapi2 as sqlite3 print "SQLITE version:", sqlite3.sqlite_version # -> 'SQLITE version: 3.16.2'
Corrupted Data / Applying Journals
In case the database is inconsistent due to an exception, there is an easy way to 'repair' the database and apply existing journals.
sqlite3 v5_store.db vacuum
This might take a while, but is the easiest way.
Read Only
sqlite3.connect('file:{}?mode=ro'.format(feat_db), uri=True) # read-only
In case you just want to fire read-only queries. This hopefully avoid unnecessary locks
Journal Mode: WAL
To avoid that SELECT queries are blocked while there is a running insert query for the same database, Write-Ahead-Logs should be used. A high-level overview is given in [1].
A few things should be noted:
- The option is remembered, meaning if you open a DB with 'wal' any connect will use it, even if the option is not given
- There are additional files for each db: *-shm, *-wal which means more space is required
To enable the write-ahead log mode
pragma journal_mode=wal
To switch back to the default journal mode, use
pragma journal_mode=delete
[1] https://charlesleifer.com/blog/going-fast-with-sqlite-and-python/
Inserts + Ignore
If one do not care if the primary key constraint is violated, the following construct can be used:
INSERT OR IGNORE INTO
This is like 'ON CONFLICT DO NOTHING' for PostgreSQL
Datetime functions
SELECT date(timestamp, 'unixepoch'), count(*) from qa group by date(timestamp, 'unixepoch');
convert unix epoch into a date, with time it is datetime(timestamp, 'unixepoch');