This article describes a possible way to debug / trace the memory consumption of dockerized python FastAPI services, especially tackling the problem for services providing endpoints and possibly using multiprocessing.
The scenario here is that we find a docker container using too much RAM e.g. by checking with docker ps and then want to understand which python objects / codelines are responsible. This basically provides a starting point to fix the problem.
The goal when selecting a method for tracing was to have detailed insight into the program, few dependencies (3rd party), small code changes for the leaking service and not too much performance impact since the service might still be used while tracing.
An example service was created to simulate a memory leak alongside the code utilizing pythons standard libraries tracemalloc module to trace the RAM usage:
The project was deployed on dev01 for testing. It can be run using dev01:~/docker_bin/mem_test/build_and_run.sh
(number of workers can be changed here) and http://dev01.picalike.corpex-kunden.de:1235
The leaking API simply implements a 'storage' dictionary in the startup function and an endpoint that increases the storage size when its called.
In the simpler case of single worker APIs we can add very simple endpoints to start a measurement (/start_tracemalloc), get memory statistics (top_mem_consumer_lines) and stop the measurement (/stop_tracemalloc). It is even possible to check the additional memory cost of using tracemalloc (/tracemalloc_self_mem_usage). The statistics lock for example like this:
#1: /./app/mem_test_api_main.py:69: 2813.4 KiB (96.0%) request.app.state.storage[idx] = list(str(random())) * 10000 #2: /usr/local/lib/python3.8/asyncio/events.py:81: 4.1 KiB (0.1%) self._context.run(self._callback, *self._args) #3: /usr/local/lib/python3.8/asyncio/locks.py:257: 1.9 KiB (0.1%) self._waiters = collections.deque()
We can clearly see here that most of the memory usage of the API (2813KiB / 96%) is created by the line request.app.state.storage[idx] = list(str(random())) * 10000
which keeps adding data to our storage.
Advantage of the endpoint solution is that we can freely start/stop measurements for a running project and collect statistics while even getting information on the additional (memory) cost of using tracemalloc.
For multi worker APIs the former solution can not be applied since we can't control which worker is used when using an endpoint. This could lead to starting tracemalloc on one worker and trying to measure on another.
However, when starting a FastAPI application using uvicorn with multiple workers, the startup and shutdown events are triggered for each worker. Starting on startup and measuring on shutdown will lead to information for each worker. Although maybe calling the /add_random_data endpoint for only a subset of the workers, we will see the memory usage for all and can therefor derive the same conclusions.
To trigger the shutdown event we need to 'kill' the container in a specific way (gracefully) to not skip the shutdown:
docker exec -it mem_test_container bash -c 'kill -SIGINT $(pgrep uvicorn)'
Doing so will then write the information to the logs before killing the container.