How to use AnalyzerEngine inline of a log stream #1515

gideonred · 2025-01-22T22:27:25Z

Hi team,

We're looking to use AnalyzerEngine in our Django backend to redact sensitive logs.

Django would usually have you set up a middleware were we would invoke something like the AnalyzerEngine for every log line, however two things give me pause:

We'd need to download a language model such as en_core_web_lg ahead of time and package it with the app. However that will increase our Django package size by half a gigabyte.
When the model is being loaded in it would use up to about half a gigabyte of memory, which would be a significant increase in memory usage.

We also have a Vector pipeline where Vector can invoke a script per log line, however doing a from presidio_analyzer import AnalyzerEngine in a script seems to take a few seconds, which would be a non-starter if we have to do it per log line.

We're trying to not set up additional services (e.g. cluster of Presidio processes that can process the logs), as that adds additional maintenance overhead.

Is there a way forward? Is there something we're missing?

The text was updated successfully, but these errors were encountered:

SharonHart · 2025-01-23T09:29:59Z

Hi @gideonred!
I would suggest experimenting with the Presidio docker containers and have the Django middleware call Presidio in REST API.
That would decouple the redaction and language model use from the Django application and won't affect its size. I do not know what is the infrastructure that you're running on, but the containerized versions are easy to run in docker or docker-compose thus simplifying maintenance overhead.

We've also seen and have some samples of presidio integrating with a more robust higher level monitoring solution, like ELK stack (logstash plugin calling presidio), or similar solutions that collect logs further down the application flow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use AnalyzerEngine inline of a log stream #1515

How to use AnalyzerEngine inline of a log stream #1515

gideonred commented Jan 22, 2025

SharonHart commented Jan 23, 2025 •

edited

Loading

How to use AnalyzerEngine inline of a log stream #1515

How to use AnalyzerEngine inline of a log stream #1515

Comments

gideonred commented Jan 22, 2025

SharonHart commented Jan 23, 2025 • edited Loading

SharonHart commented Jan 23, 2025 •

edited

Loading