A Python project that retrieves and processes GitHub repository traffic data, including views, clones, stars, and forks. The data can be stored in MongoDB or Azure Cosmos DB and optionally uploaded to Azure Blob Storage. The project also includes a migration script to transfer data from MongoDB to Azure Cosmos DB.
- Features
- Prerequisites
- Installation
- Configuration
- Usage
- Examples
- Data Storage
- Migration
- Logging
- Contributing
- License
- Fetch GitHub Repository Data:
- Views and unique visitors
- Clones and unique cloners
- Stars and forks over time
- Popular content and referring sites
- Data Storage:
- Supports MongoDB and Azure Cosmos DB
- Handles data storage and retrieval with
db.py
- Output Formats:
- Excel and JSON formats
- Optionally uploads output files to Azure Blob Storage
- Migration Script:
migrate.py
to transfer data from MongoDB to Azure Cosmos DB
- Authentication:
- Uses GitHub Personal Access Token
- Supports Azure Managed Identity for authentication
- Python 3.7 or higher
- GitHub Personal Access Token with appropriate permissions
- Database Access:
- MongoDB or Azure Cosmos DB connection string
- Azure Services (optional):
- Azure Blob Storage account
- Azure account with permissions to use Managed Identity authentication
-
Clone the repository:
git clone https://github.com/yourusername/your-repo-name.git cd your-repo-name
-
Install the required Python packages:
pip install -r requirements.txt
Before running the scripts, set up the following configurations:
-
GitHub Personal Access Token:
- Create a token with the necessary permissions.
- Store it in a text file (e.g.,
token.txt
).
-
Database Connection Strings:
- MongoDB: Provide your MongoDB connection string.
- Azure Cosmos DB: Provide your Cosmos DB connection string.
-
Azure Storage Connection String (optional):
- If you want to upload output files to Azure Blob Storage, provide the connection string.
-
Managed Identity Authentication (optional):
- If using Managed Identity for Azure services, ensure your environment is correctly configured.
You can run the report.py
script to fetch and process GitHub repository traffic data.
python report.py --repo <repository_name> --owner <repository_owner> \
--token-file <path_to_token_file> --db-connection-string <db_connection_string> \
--db-type <mongodb_or_cosmosdb> [options]
--repo
: Name of the GitHub repository.--owner
: Owner (user or organization) of the repository.--token-file
: Path to the file containing your GitHub Personal Access Token.--db-connection-string
: Connection string for MongoDB or Azure Cosmos DB.--db-type
: Type of the database (mongodb
orcosmosdb
).
--output-format
: Output format for the data (excel
,json
, orall
). Default isexcel
.--filename
: Custom filename for the output files.--azure-storage-connection-string
: Azure Blob Storage connection string for storing output files.--managed-identity-storage
: Use Managed Identity for Azure Blob Storage authentication.--help
: Show help message and exit.
python report.py --repo my-repo --owner my-username \
--token-file token.txt --db-connection-string "mongodb://localhost:27017" \
--db-type mongodb
python report.py --repo my-repo --owner my-username \
--token-file token.txt --db-connection-string "<cosmos_db_connection_string>" \
--db-type cosmosdb
python report.py --repo my-repo --owner my-username \
--token-file token.txt --db-connection-string "mongodb://localhost:27017" \
--db-type mongodb --azure-storage-connection-string "<storage_connection_string>"
The data fetched from GitHub is stored in the specified database with the following collections or containers:
About
: Repository description.TrafficStats
: Views and unique visitors over time.GitClones
: Clones and unique cloners over time.Stars
: Stars over time.Forks
: Forks over time.
The migrate.py
script is used to migrate data from MongoDB to Azure Cosmos DB.
python migrate.py
Before running the migration script, ensure you have updated the connection strings in migrate.py
:
mongo_uri
: MongoDB connection string.cosmos_uri
: Azure Cosmos DB connection string.
The scripts use Python's logging module to provide detailed logs. Errors during migration are logged to migration_errors.log
.
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.
I hope this README meets your requirements. Let me know if there's anything you'd like to add or change.