Table of Contents

Mongo Findings

COMPATIBILITY

Due to the fact that we use different versions of mongodb, we need a way to know which PyMongo need to be used (Motor uses PyMongo as a dependency)

The first link goes to a table that shows the versions of PyMongo and it's compatibility

And even that, there's a possibility of still have some problems with authentication, just like the visualytics mongodb, so you need to specify the authentication on the database. To do this, just add the /?authSource=admin&authMechanism=SCRAM-SHA-1 in the string connection For the visualytics coverage problem, we used the PyMongo 3.12 version

Ex:

def get_data_mongo():
    client = MongoClient("mongodb://picalike3:6fK5DnNFbhin@mongodb01.live.picalike.corpex-kunden.de/?authSource=picalike3&authMechanism=SCRAM-SHA-1")
    db = client['picalike3']
    return db

fonts:

tags: mongo, compatibility, auth failed, Authentication failed, code 18

CURSORS

When iterating over large amounts of data typically a cursor is used e.g.:

client = pymongo.MongoClient(DB_URI)
collection = client[db_name][collection_name]
filters = {...}
projection = {...}
for doc in collection.find(filters, projection):
    process_document(doc)

This can cause problems when process_document(doc) takes long time. Iterating over the cursor for the mongo means fetching a batch of documents returning them and repeating this when the batch is empty, until the cursor has no more documents to fetch. If there is too much time between each fetch (10 minutes), the cursor will be closed automatically per default. To prevent this, one can adjust the find function to collection.find(filters, projection, no_cursor_timeout=True). HOWEVER this is dangerous! If the script crashes before the cursor is empty, it will not be closed automatically and stay open until the mongo server is restarted (forever for production systems). One problem is missing - even if one uses no_cursor_timeout, the session in which the cursor lives can expire. To prevent this we need to use a defined session and refresh it every x minutes (per default it would be removed after 30 minutes).

Clean solution to always close the cursor and use a defined session:

with client.start_session() as session:
    with collection.find(filters, projection, no_cursor_timeout=True, session=session) as cursor:
        last_refresh = datetime.now().timestamp()
        for doc in cursor:
            now = datetime.now().timestamp()
            if now - last_refresh > 5 * 60:
                res = client.admin.command('refreshSessions', [session.session_id], session=session)
                last_refresh = now
            process_document(doc)

compare: https://developpaper.com/four-solutions-to-the-problem-of-mongodb-cursor-timeout/

tags: CursorNotFound, cursor not found, cursor timeout, cursor closed

Sebastian findings

Mongo Knowlegde

Product findings

<HTML><ol></HTML>