A production-ready AI-powered content moderation system that automatically detects toxic content in real-time using AWS cloud services. This comprehensive MLOps pipeline demonstrates end-to-end machine learning deployment with enhanced decision-making capabilities.
User Content โ API Gateway โ Lambda (preprocessing) โ SageMaker (ML prediction)
โ
Step Functions Orchestration โ Bedrock (contextual analysis) โ CloudWatch (monitoring) โ SNS (alerts)
Key Features:
- Real-time toxicity detection with 85%+ accuracy
- Enhanced AI decision-making using Amazon Bedrock (Claude Sonnet-4) for borderline cases
- Automated MLOps pipeline with SageMaker Pipelines
- Production monitoring with CloudWatch dashboards and SNS alerts
- Step Functions orchestration for complex workflow management
- REST API for seamless integration
Prerequisites: AWS CLI configured with appropriate permissions (see trust-policy.json
)
# 1. Prepare training data
python data_preparation.py
# 2. Setup AWS infrastructure (S3, IAM roles, etc.)
python setup_infrastructure.py
# 3. Option A: Full MLOps Pipeline (Recommended)
python sagemaker_pipeline.py
# 3. Option B: Standalone Training (Fallback)
python launch_training.py
# 4. Deploy SageMaker endpoint
python deploy_endpoint.py
# 5. Setup API Gateway
python api_gateway_setup.py
# 6. Setup monitoring infrastructure
python cloudwatch_monitoring.py
# 7. Setup Step Functions orchestration
python step_functions_orchestration.py
# 8. Test the complete system
python test_system.py
# 9. Launch demo frontend
python app.py
- Training Accuracy: 95%+
- Test Accuracy: 85%+
- Cross-validation: 88% ยฑ 2%
- Inference Latency: <100ms
- Enhanced with Bedrock: Improved contextual understanding for complex cases
- SageMaker: Model training, hosting, and MLOps pipelines
- Lambda: Serverless functions for preprocessing and orchestration
- API Gateway: RESTful API endpoints for real-time inference
- Step Functions: Workflow orchestration and complex decision logic
- Bedrock: Enhanced contextual analysis using Claude Sonnet-4
- CloudWatch: Monitoring, logging, and alerting
- SNS: Real-time notifications and alerts
- S3: Data storage and model artifacts
- IAM: Security and access management
- Algorithm: Logistic Regression with TF-IDF features
- Framework: scikit-learn on SageMaker
- Features: N-gram analysis (1-2), stop word removal, 10K feature limit
- Training: Automated cross-validation and hyperparameter optimization
- Evaluation: Comprehensive metrics including precision, recall, F1-score
- Microservices: Modular Lambda functions for specific tasks
- Event-driven: Step Functions for complex workflow orchestration
- Monitoring: Real-time performance tracking and alerting
- Scalability: Auto-scaling endpoints and serverless compute
setup_infrastructure.py
- AWS infrastructure setup (S3, IAM roles)data_preparation.py
- Training data generation and S3 upload
sagemaker_pipeline.py
- Automated MLOps pipeline (Recommended)launch_training.py
- Standalone training job (Fallback)deploy_endpoint.py
- SageMaker endpoint deploymentml_scripts/train.py
- Core training script with evaluationml_scripts/model_evaluation.py
- Pipeline model evaluationml_scripts/data_validation.py
- Data quality checks
prediction_lambda.py
- Text preprocessing and ML predictionapi_gateway_setup.py
- REST API configurationstep_functions_orchestration.py
- Complete workflow orchestration
bedrock_integration.py
- Amazon Bedrock integration for contextual analysis
cloudwatch_monitoring.py
- Production monitoring setupmonitor_pipeline.py
- Training pipeline monitoring
test_system.py
- Basic system testingtest_complete_system.py
- Comprehensive system validationapp.py
- Flask demo frontendmoderate_content.py
- Direct content moderation utility
aws_config.json
- AWS configurationendpoint_info.json
- SageMaker endpoint detailsapi_info.json
- API Gateway configurationorchestration_config.json
- Step Functions setuppipeline_model_info.json
- MLOps pipeline model info
- Primary: ML model provides initial toxicity scoring
- Enhanced: Bedrock (Claude Sonnet-4) analyzes borderline cases for context
- Fallback: Graceful degradation when Bedrock is unavailable
- Automated Pipeline: SageMaker Pipelines with data validation, training, and evaluation
- Model Registry: Versioned model management with approval workflows
- Continuous Monitoring: Real-time performance tracking and drift detection
- Serverless: Lambda functions auto-scale based on demand
- Step Functions: Complex workflow orchestration with error handling
- API Gateway: Rate limiting, authentication, and request validation
- Real-time Metrics: Toxicity detection rates, latency, confidence scores
- Custom Dashboards: CloudWatch dashboards for system health
- Automated Alerts: SNS notifications for performance degradation
- Image Moderation: Computer vision for inappropriate image detection
- Multi-language Support: Extend to non-English content
- Automated Retraining: Model updates based on performance drift
- Advanced Analytics: Content trend analysis and reporting
- Caching Layer: Redis for frequently moderated content
- Edge Deployment: CloudFront integration for global latency reduction
- Batch Processing: SQS integration for high-volume scenarios
- Model Not Found: Run
sagemaker_pipeline.py
first (preferred) orlaunch_training.py
- Bedrock Access: Ensure Bedrock model access is enabled in AWS console
- API Gateway 5xx: Check Lambda function logs in CloudWatch
- Step Functions Failures: Verify IAM permissions and Lambda timeouts
# Check system status
python test_system.py
# Validate complete pipeline
python test_complete_system.py
# Test individual components
python moderate_content.py "test content"
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! This project demonstrates:
- End-to-end MLOps best practices
- Production-ready AWS architecture
- Scalable content moderation solutions
Perfect for learning cloud-based machine learning deployments and preparing for MLOps roles.