Content Moderation System

A production-ready AI-powered content moderation system that automatically detects toxic content in real-time using AWS cloud services. This comprehensive MLOps pipeline demonstrates end-to-end machine learning deployment with enhanced decision-making capabilities.

🏗️ Architecture Overview

User Content → API Gateway → Lambda (preprocessing) → SageMaker (ML prediction) 
     ↓
Step Functions Orchestration → Bedrock (contextual analysis) → CloudWatch (monitoring) → SNS (alerts)

Key Features:

Real-time toxicity detection with 85%+ accuracy
Enhanced AI decision-making using Amazon Bedrock (Claude Sonnet-4) for borderline cases
Automated MLOps pipeline with SageMaker Pipelines
Production monitoring with CloudWatch dashboards and SNS alerts
Step Functions orchestration for complex workflow management
REST API for seamless integration

🚀 Quick Start

Prerequisites: AWS CLI configured with appropriate permissions (see trust-policy.json)

# 1. Prepare training data
python data_preparation.py

# 2. Setup AWS infrastructure (S3, IAM roles, etc.)
python setup_infrastructure.py

# 3. Option A: Full MLOps Pipeline (Recommended)
python sagemaker_pipeline.py

# 3. Option B: Standalone Training (Fallback)
python launch_training.py

# 4. Deploy SageMaker endpoint
python deploy_endpoint.py

# 5. Setup API Gateway
python api_gateway_setup.py

# 6. Setup monitoring infrastructure
python cloudwatch_monitoring.py

# 7. Setup Step Functions orchestration
python step_functions_orchestration.py

# 8. Test the complete system
python test_system.py

# 9. Launch demo frontend
python app.py

📊 Model Performance

Training Accuracy: 95%+
Test Accuracy: 85%+
Cross-validation: 88% ± 2%
Inference Latency: <100ms
Enhanced with Bedrock: Improved contextual understanding for complex cases

🛠️ Technical Stack

AWS Services

SageMaker: Model training, hosting, and MLOps pipelines
Lambda: Serverless functions for preprocessing and orchestration
API Gateway: RESTful API endpoints for real-time inference
Step Functions: Workflow orchestration and complex decision logic
Bedrock: Enhanced contextual analysis using Claude Sonnet-4
CloudWatch: Monitoring, logging, and alerting
SNS: Real-time notifications and alerts
S3: Data storage and model artifacts
IAM: Security and access management

Machine Learning

Algorithm: Logistic Regression with TF-IDF features
Framework: scikit-learn on SageMaker
Features: N-gram analysis (1-2), stop word removal, 10K feature limit
Training: Automated cross-validation and hyperparameter optimization
Evaluation: Comprehensive metrics including precision, recall, F1-score

Architecture Patterns

Microservices: Modular Lambda functions for specific tasks
Event-driven: Step Functions for complex workflow orchestration
Monitoring: Real-time performance tracking and alerting
Scalability: Auto-scaling endpoints and serverless compute

📁 Project Structure

🏗️ Core Infrastructure

setup_infrastructure.py - AWS infrastructure setup (S3, IAM roles)
data_preparation.py - Training data generation and S3 upload

🤖 Machine Learning Pipeline

sagemaker_pipeline.py - Automated MLOps pipeline (Recommended)
launch_training.py - Standalone training job (Fallback)
deploy_endpoint.py - SageMaker endpoint deployment
ml_scripts/train.py - Core training script with evaluation
ml_scripts/model_evaluation.py - Pipeline model evaluation
ml_scripts/data_validation.py - Data quality checks

🔄 Processing & APIs

prediction_lambda.py - Text preprocessing and ML prediction
api_gateway_setup.py - REST API configuration
step_functions_orchestration.py - Complete workflow orchestration

🧠 Enhanced AI Capabilities

bedrock_integration.py - Amazon Bedrock integration for contextual analysis

📊 Monitoring & Operations

cloudwatch_monitoring.py - Production monitoring setup
monitor_pipeline.py - Training pipeline monitoring

🧪 Testing & Demo

test_system.py - Basic system testing
test_complete_system.py - Comprehensive system validation
app.py - Flask demo frontend
moderate_content.py - Direct content moderation utility

📋 Configuration Files

aws_config.json - AWS configuration
endpoint_info.json - SageMaker endpoint details
api_info.json - API Gateway configuration
orchestration_config.json - Step Functions setup
pipeline_model_info.json - MLOps pipeline model info

🔧 Key Features Explained

1. Dual-Mode AI Decision Making

Primary: ML model provides initial toxicity scoring
Enhanced: Bedrock (Claude Sonnet-4) analyzes borderline cases for context
Fallback: Graceful degradation when Bedrock is unavailable

2. Production-Ready MLOps

Automated Pipeline: SageMaker Pipelines with data validation, training, and evaluation
Model Registry: Versioned model management with approval workflows
Continuous Monitoring: Real-time performance tracking and drift detection

3. Scalable Architecture

Serverless: Lambda functions auto-scale based on demand
Step Functions: Complex workflow orchestration with error handling
API Gateway: Rate limiting, authentication, and request validation

4. Comprehensive Monitoring

Real-time Metrics: Toxicity detection rates, latency, confidence scores
Custom Dashboards: CloudWatch dashboards for system health
Automated Alerts: SNS notifications for performance degradation

🔮 Future Enhancements

Planned Features

Image Moderation: Computer vision for inappropriate image detection
Multi-language Support: Extend to non-English content
Automated Retraining: Model updates based on performance drift
Advanced Analytics: Content trend analysis and reporting

Scalability Improvements

Caching Layer: Redis for frequently moderated content
Edge Deployment: CloudFront integration for global latency reduction
Batch Processing: SQS integration for high-volume scenarios

🚨 Troubleshooting

Common Issues

Model Not Found: Run sagemaker_pipeline.py first (preferred) or launch_training.py
Bedrock Access: Ensure Bedrock model access is enabled in AWS console
API Gateway 5xx: Check Lambda function logs in CloudWatch
Step Functions Failures: Verify IAM permissions and Lambda timeouts

Debugging Commands

# Check system status
python test_system.py

# Validate complete pipeline
python test_complete_system.py

# Test individual components
python moderate_content.py "test content"

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! This project demonstrates:

End-to-end MLOps best practices
Production-ready AWS architecture
Scalable content moderation solutions

Perfect for learning cloud-based machine learning deployments and preparing for MLOps roles.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Content Moderation System

🏗️ Architecture Overview

🚀 Quick Start

📊 Model Performance

🛠️ Technical Stack

AWS Services

Machine Learning

Architecture Patterns

📁 Project Structure

🏗️ Core Infrastructure

🤖 Machine Learning Pipeline

🔄 Processing & APIs

🧠 Enhanced AI Capabilities

📊 Monitoring & Operations

🧪 Testing & Demo

📋 Configuration Files

🔧 Key Features Explained

1. Dual-Mode AI Decision Making

2. Production-Ready MLOps

3. Scalable Architecture

4. Comprehensive Monitoring

🔮 Future Enhancements

Planned Features

Scalability Improvements

🚨 Troubleshooting

Common Issues

Debugging Commands

📝 License

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
demo		demo
ml_scripts		ml_scripts
templates		templates
test_model		test_model
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
api_gateway_setup.py		api_gateway_setup.py
app.py		app.py
bedrock_integration.py		bedrock_integration.py
cloudwatch_monitoring.py		cloudwatch_monitoring.py
comprehensive_test_report.json		comprehensive_test_report.json
data_preparation.py		data_preparation.py
deploy_endpoint.py		deploy_endpoint.py
launch_training.py		launch_training.py
moderate_content.py		moderate_content.py
monitor_pipeline.py		monitor_pipeline.py
prediction_lambda.py		prediction_lambda.py
requirements.txt		requirements.txt
sagemaker_pipeline.py		sagemaker_pipeline.py
setup_infrastructure.py		setup_infrastructure.py
step_functions_orchestration.py		step_functions_orchestration.py
storage_manager.py		storage_manager.py
test_complete_system.py		test_complete_system.py
test_endpoint_simple.py		test_endpoint_simple.py
test_system.py		test_system.py
trust-policy.json		trust-policy.json

License

dsnehitha/content-moderation-system

Folders and files

Latest commit

History

Repository files navigation

Content Moderation System

🏗️ Architecture Overview

🚀 Quick Start

📊 Model Performance

🛠️ Technical Stack

AWS Services

Machine Learning

Architecture Patterns

📁 Project Structure

🏗️ Core Infrastructure

🤖 Machine Learning Pipeline

🔄 Processing & APIs

🧠 Enhanced AI Capabilities

📊 Monitoring & Operations

🧪 Testing & Demo

📋 Configuration Files

🔧 Key Features Explained

1. Dual-Mode AI Decision Making

2. Production-Ready MLOps

3. Scalable Architecture

4. Comprehensive Monitoring

🔮 Future Enhancements

Planned Features

Scalability Improvements

🚨 Troubleshooting

Common Issues

Debugging Commands

📝 License

🤝 Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages