Index Message from your telegram account and find juicy content in it.
My usecase
I'm in several leak channels and I needed to be able to quickly search for interesting files and download them
- Clone this repo
- Install dependencies with
poetry install
- Run
poetry run python indexer.py
for init config.yaml file and follow the instructions
- Run
poetry run python indexer.py
again to index your dialogs - Go to on your mongodb on collection
channels
and enable the channels you want to index - Run
poetry run python indexer.py
again to index the selected channels (first run take a while depending on the number of messages and dialogs you have enabled)
Note
You can rerun the indexer at any time to update the index with new messages
- Go on your mongodb on collection
messages
and find type of content you want to download - Write a query to find this content and add it in the
mongodb_download_filter
field of the config.yaml file as yaml (see example on the config.yaml file) - Run
poetry run python downloader.py
to download the content
Note
You can rerun the downloader at any time to download only the new content
The telegram api is rate limited, so you download speed is limited by telegram
Query
db.messages.find({type: 'messagemediadocument',mime_type: 'text/plain'}, {_id: 0, filename:1})
You can use the docker-compose file to run a mongodb instance with web interface
docker-compose up
- MongoDB port:
27017
- MongoDB data directory:
./data
- MongoExpress web port:
8081
- Poetry - Python dependency management
- MongodbClient - Mongodb GUI