8000 GitHub - dsnehitha/content-moderation-system
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

dsnehitha/content-moderation-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

18 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Content Moderation System

A production-ready AI-powered content moderation system that automatically detects toxic content in real-time using AWS cloud services. This comprehensive MLOps pipeline demonstrates end-to-end machine learning deployment with enhanced decision-making capabilities.

๐Ÿ—๏ธ Architecture Overview

User Content โ†’ API Gateway โ†’ Lambda (preprocessing) โ†’ SageMaker (ML prediction) 
     โ†“
Step Functions Orchestration โ†’ Bedrock (contextual analysis) โ†’ CloudWatch (monitoring) โ†’ SNS (alerts)

Architecture Graph View

Key Features:

  • Real-time toxicity detection with 85%+ accuracy
  • Enhanced AI decision-making using Amazon Bedrock (Claude Sonnet-4) for borderline cases
  • Automated MLOps pipeline with SageMaker Pipelines
  • Production monitoring with CloudWatch dashboards and SNS alerts
  • Step Functions orchestration for complex workflow management
  • REST API for seamless integration

๐Ÿš€ Quick Start

Prerequisites: AWS CLI configured with appropriate permissions (see trust-policy.json)

# 1. Prepare training data
python data_preparation.py

# 2. Setup AWS infrastructure (S3, IAM roles, etc.)
python setup_infrastructure.py

# 3. Option A: Full MLOps Pipeline (Recommended)
python sagemaker_pipeline.py

# 3. Option B: Standalone Training (Fallback)
python launch_training.py

# 4. Deploy SageMaker endpoint
python deploy_endpoint.py

# 5. Setup API Gateway
python api_gateway_setup.py

# 6. Setup monitoring infrastructure
python cloudwatch_monitoring.py

# 7. Setup Step Functions orchestration
python step_functions_orchestration.py

# 8. Test the complete system
python test_system.py

# 9. Launch demo frontend
python app.py

๐Ÿ“Š Model Performance

  • Training Accuracy: 95%+
  • Test Accuracy: 85%+
  • Cross-validation: 88% ยฑ 2%
  • Inference Latency: <100ms
  • Enhanced with Bedrock: Improved contextual understanding for complex cases

๐Ÿ› ๏ธ Technical Stack

AWS Services

  • SageMaker: Model training, hosting, and MLOps pipelines
  • Lambda: Serverless functions for preprocessing and orchestration
  • API Gateway: RESTful API endpoints for real-time inference
  • Step Functions: Workflow orchestration and complex decision logic
  • Bedrock: Enhanced contextual analysis using Claude Sonnet-4
  • CloudWatch: Monitoring, logging, and alerting
  • SNS: Real-time notifications and alerts
  • S3: Data storage and model artifacts
  • IAM: Security and access management

Machine Learning

  • Algorithm: Logistic Regression with TF-IDF features
  • Framework: scikit-learn on SageMaker
  • Features: N-gram analysis (1-2), stop word removal, 10K feature limit
  • Training: Automated cross-validation and hyperparameter optimization
  • Evaluation: Comprehensive metrics including precision, recall, F1-score

Architecture Patterns

  • Microservices: Modular Lambda functions for specific tasks
  • Event-driven: Step Functions for complex workflow orchestration
  • Monitoring: Real-time performance tracking and alerting
  • Scalability: Auto-scaling endpoints and serverless compute

๐Ÿ“ Project Structure

๐Ÿ—๏ธ Core Infrastructure

  • setup_infrastructure.py - AWS infrastructure setup (S3, IAM roles)
  • data_preparation.py - Training data generation and S3 upload

๐Ÿค– Machine Learning Pipeline

  • sagemaker_pipeline.py - Automated MLOps pipeline (Recommended)
  • launch_training.py - Standalone training job (Fallback)
  • deploy_endpoint.py - SageMaker endpoint deployment
  • ml_scripts/train.py - Core training script with evaluation
  • ml_scripts/model_evaluation.py - Pipeline model evaluation
  • ml_scripts/data_validation.py - Data quality checks

๐Ÿ”„ Processing & APIs

  • prediction_lambda.py - Text preprocessing and ML prediction
  • api_gateway_setup.py - REST API configuration
  • step_functions_orchestration.py - Complete workflow orchestration

๐Ÿง  Enhanced AI Capabilities

  • bedrock_integration.py - Amazon Bedrock integration for contextual analysis

๐Ÿ“Š Monitoring & Operations

  • cloudwatch_monitoring.py - Production monitoring setup
  • monitor_pipeline.py - Training pipeline monitoring

๐Ÿงช Testing & Demo

  • test_system.py - Basic system testing
  • test_complete_system.py - Comprehensive system validation
  • app.py - Flask demo frontend
  • moderate_content.py - Direct content moderation utility

๐Ÿ“‹ Configuration Files

  • aws_config.json - AWS configuration
  • endpoint_info.json - SageMaker endpoint details
  • api_info.json - API Gateway configuration
  • orchestration_config.json - Step Functions setup
  • pipeline_model_info.json - MLOps pipeline model info

๐Ÿ”ง Key Features Explained

1. Dual-Mode AI Decision Making

  • Primary: ML model provides initial toxicity scoring
  • Enhanced: Bedrock (Claude Sonnet-4) analyzes borderline cases for context
  • Fallback: Graceful degradation when Bedrock is unavailable

2. Production-Ready MLOps

  • Automated Pipeline: SageMaker Pipelines with data validation, training, and evaluation
  • Model Registry: Versioned model management with approval workflows
  • Continuous Monitoring: Real-time performance tracking and drift detection

3. Scalable Architecture

  • Serverless: Lambda functions auto-scale based on demand
  • Step Functions: Complex workflow orchestration with error handling
  • API Gateway: Rate limiting, authentication, and request validation

4. Comprehensive Monitoring

  • Real-time Metrics: Toxicity detection rates, latency, confidence scores
  • Custom Dashboards: CloudWatch dashboards for system health
  • Automated Alerts: SNS notifications for performance degradation

๐Ÿ”ฎ Future Enhancements

Planned Features

  • Image Moderation: Computer vision for inappropriate image detection
  • Multi-language Support: Extend to non-English content
  • Automated Retraining: Model updates based on performance drift
  • Advanced Analytics: Content trend analysis and reporting

Scalability Improvements

  • Caching Layer: Redis for frequently moderated content
  • Edge Deployment: CloudFront integration for global latency reduction
  • Batch Processing: SQS integration for high-volume scenarios

๐Ÿšจ Troubleshooting

Common Issues

  • Model Not Found: Run sagemaker_pipeline.py first (preferred) or launch_training.py
  • Bedrock Access: Ensure Bedrock model access is enabled in AWS console
  • API Gateway 5xx: Check Lambda function logs in CloudWatch
  • Step Functions Failures: Verify IAM permissions and Lambda timeouts

Debugging Commands

# Check system status
python test_system.py

# Validate complete pipeline
python test_complete_system.py

# Test individual components
python moderate_content.py "test content"

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿค Contributing

Contributions are welcome! This project demonstrates:

  • End-to-end MLOps best practices
  • Production-ready AWS architecture
  • Scalable content moderation solutions

Perfect for learning cloud-based machine learning deployments and preparing for MLOps roles.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0