Skip to content

The ๐Ÿ’Œ Gmail Email Processor is a Python-based tool designed to process Gmail mbox files, extract email content, and save the processed emails into organized text files. It decodes MIME words, normalizes text to ensure a maximum of two consecutive line breaks, and cleans email bodies to remove unwanted characters.

License

Notifications You must be signed in to change notification settings

WeMake-CX/gmail-email-processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

12 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Gmail Email Processor

Overview

The Gmail Email Processor is a tool designed to process Gmail mbox files, extract email content, and save the processed emails into text files. It handles decoding MIME words, normalizing text, and ensuring a clean output format.

Features

  • Decodes MIME words in email headers.
  • Normalizes text to ensure a maximum of two consecutive line breaks.
  • Cleans email bodies to remove unwanted characters.
  • Sorts emails by date and writes them to text files based on the sender's domain.

Setup

Prerequisites

  • Miniconda

Installation

  1. Clone the repository:

    git clone https://github.com/WEMAKE-CX/gmail-email-processor.git
    cd gmail-email-processor
  2. Run the setup script:

    ./start.sh

Usage

  1. Place your mbox files in the source/Gmail directory.

  2. Run the processing script:

    python emailembed.py

Code Overview

emailembed.py

Handles the processing of mbox files and extraction of email content.

start.sh

Sets up the environment using Miniconda and installs required packages.

Example Output

Processed emails are saved in the output directory, with filenames based on the sender's domain.

License

This project is licensed under the MIT License.

About

The ๐Ÿ’Œ Gmail Email Processor is a Python-based tool designed to process Gmail mbox files, extract email content, and save the processed emails into organized text files. It decodes MIME words, normalizes text to ensure a maximum of two consecutive line breaks, and cleans email bodies to remove unwanted characters.

Topics

Resources

License

Stars

Watchers

Forks