Complete NLP system for automatic emotion detection in Dutch video content
Complete NLP pipeline developed for a Content Intelligence Agency that automatically analyzes emotions in YouTube videos. The system receives a video URL as input and generates a CSV with timestamps, transcriptions, translated sentences, and emotion labels based on the Ekman model (happiness, sadness, anger, surprise, fear, disgust, neutral). The pipeline consists of four steps: audio download (PyTubeFix), speech-to-text conversion (Whisper Large-v3-Turbo), Dutch → English translation (MarianMT), and emotion classification (RobBERT). Eight different models were tested for emotion classification.
There was limited Dutch emotion data available, so the entire class collectively labeled 4,000 sentences for training. Additionally, there was strong class imbalance: neutral and positive emotions were very common, while fear, disgust, and anger were rare. Data augmentation via Dutch WordNet and SpaCy (synonym replacement) expanded rare emotion classes by 6×. An important choice concerned the model: RobBERT (native Dutch, F1-score 85%) versus BERT on translated data (F1-score 92%), a trade-off between accuracy and system complexity.
The RobBERT model achieved 85% F1-score on Dutch emotion classification, significantly better than traditional models (LSTM ~65%, SVM ~58%). Despite BERT on English translations achieving 92%, we chose direct Dutch processing to avoid nuance loss and translation errors. The pipeline is fully operational and processes YouTube URLs into a structured CSV file. Systematic model comparison showed that Transformers perform 20-30% better than traditional NLP methods for emotion classification.
A fully automated system that processes a YouTube URL into a structured CSV file with timestamps, transcriptions, translations, and emotion labels.
Systematically tested and evaluated 8 different models on our 4000-sentence dataset.
| Model | Type | F1-Score | Status |
|---|---|---|---|
| BERT (English) | Transformer on translated data | 92% | Highest Score |
| RobBERT (NL) | Dutch Transformer | 85% | Used |
| LSTM | Recurrent Network | ~65% | Baseline |
| RNN | Basic Recurrent | ~62% | Baseline |
| SVM | Support Vector Machine | ~58% | Baseline |
| Logistic Regression | Linear | ~55% | Baseline |
| Naive Bayes | Probabilistic | ~53% | Baseline |
Although English BERT scored highest (92%), we chose RobBERT (85%) because:
Below are concrete examples of how the pipeline works on a 40-minute test video:
RobBERT achieved good results for all emotion categories: