Emotion Detection with Convolutional Neural Networks (CNN)

 

Overview:

This project enhances an existing emotion detection system by expanding its capabilities to recognize more emotions and improving performance on both static images and video streams. The system uses a Convolutional Neural Network (CNN) trained on an augmented dataset combining FER-2013 and AffectNet to classify facial expressions into 8 emotion categories: Anger, Contempt, Disgust, Fear, Happiness, Neutral, Sadness, and Surprise.

    Key Features & Technical Achievements

     

    1- Enhanced Dataset & Data Augmentation

      • Dataset Expansion: Combined FER-2013 (35,887 grayscale images) with AffectNet (450,000 annotated images) to improve diversity and coverage.
      • Class Balancing: Addressed imbalance (e.g., fewer samples for Contempt & Disgust) using selective augmentation (rotations, flips, brightness adjustments).

      • Augmentation Techniques: Applied Gaussian noise, random rotations (±15°), brightness/contrast shifts, and horizontal flipping to improve model robustness.

    2. CNN Architecture & Training

    • Model Structure:

      • Input Layer: 48×48 grayscale images (normalized).

      • Convolutional Blocks: Four blocks (Conv2D + BatchNorm + ReLU + MaxPooling) for hierarchical feature extraction.

      • Dropout Layers: Prevent overfitting by randomly deactivating neurons (25-50% dropout rate).

      • Output Layer: Softmax activation for 8-class classification.

    • Training Process:

      • Optimizer: Adam (learning rate = 0.001).

      • Loss Function: Categorical Crossentropy.

      • Early Stopping: Halted training if validation accuracy plateaued.

      • Result: Achieved 72.3% test accuracy (surpassing FER-2013 benchmark of 65.2%).

    3. Static Image Emotion Detection

    • Face Detection: Used Haar Cascade classifier to locate faces in images.

    • Preprocessing: Converted to grayscale, resized to 48×48, and normalized pixel values.

    • Inference: CNN predicts emotions, visualized with bounding boxes and labels.

    • Performance: Handles multiple faces per image with real-time processing.

     

    4. Real-Time Video Emotion Analysis

    • Frame Processing: Analyzed videos at 15 FPS, skipping frames when necessary for speed.

    • Multi-Face Tracking: Detected and classified emotions for up to 3 faces simultaneously.

    • Temporal Smoothing: Applied 3-frame moving average to stabilize predictions.

    • Visual Output: Overlaid emotion labels on video and generated bar graphs summarizing emotion distribution.

     

     

    Technical Challenges & Solutions

    Challenge Solution
    Class Imbalance Augmented underrepresented emotions (Disgust, Contempt) more aggressively.
    Real-Time Performance Reduced processing to 15 FPS, cached face detection regions.
    Cross-Dataset Variance Standardized preprocessing (grayscale conversion, histogram equalization).
    Overfitting Added Dropout and BatchNorm layers, used early stopping.

    Results & Impact

    • Accuracy: 72.3% (outperforming baseline models).

    • Strongest Predictions: Happiness (85% precision), Anger (71% precision).

    • Weaknesses: Confusion between Fear/Surprise (23% misclassification) due to similar facial cues.

    • Applications: Mental health tools, human-computer interaction, marketing analytics.

    Future Improvements

    1. Better Face Detection: Replace Haar Cascade with MTCNN for improved accuracy.

    2. Temporal Modeling: Add LSTM layers to analyze emotion transitions in videos.

    3. Deployment: Optimize for edge devices using TensorFlow Lite or ONNX.

    Conclusion

    This project demonstrates advanced CNN-based emotion detection with improvements in dataset diversity, model accuracy, and real-time video processing. It showcases expertise in deep learning, computer vision, and performance optimization, making it suitable for applications in AI-driven behavioral analysis.