Extract Gmail messages into DuckDB for easy querying & analysis
- Clone this repository:
git clone https://github.com/TFMV/MailDuck.git
. - Install the requirements:
pip install -r requirements.txt
- Create a Google Cloud project here.
- Open Gmail in API & Services and activate the Gmail API.
- Open the OAuth consent screen and create a new consent screen. You only need to provide a name and contact data.
- Next open Create OAuth client ID and create credentials for a
Desktop app
. Download the credentials file and save it undercredentials.json
in the root of this repository.
Here is a detailed guide on how to create the credentials: https://developers.google.com/gmail/api/quickstart/python#set_up_your_environment.
- Run the script:
python main.py sync --data-dir path/to/your/data
where--<data-dir>
is the path where all data is stored. This creates a DuckDB database in<data-dir>/messages.db
and stores the user credentials under<data-dir>/credentials.json
. - After the script has finished, you can query the database using, for example, the
duckdb
command line tool:duckdb <data-dir>/messages.db
. - You can run the script again to sync all new messages. Provide
--full-sync
to force a full sync. However, this will only update the read status, the labels, and the last indexed timestamp for existing messages.
python main.py sync-message --data-dir path/to/your/data --message-id <message-id>
--data-dir
: Path to the directory where the data is stored.--full-sync
: Force a full sync of all messages.--message-id
: The ID of the message to sync.
This project is licensed under the MIT License. See the LICENSE file for details.