Hate Speech & Offensive Language Detection
Automated system to classify tweets into hate speech, offensive language, or neutral — trained on 24,783 tweets with a Neural Network accuracy of 95.89%.
Stack
Python · Scikit-learn · SMOTE · TF-IDF · SVM · Random Forest · Neural Network
My Role
Final project for Machine Learning course at ITS Surabaya, built with a team of 6. My contributions:
- Data Preprocessing Pipeline — end-to-end text cleaning, stopword removal, and normalization
- Feature Engineering — TF-IDF vectorization with
max_features=5000 - Model Benchmarking — evaluated SVM, Random Forest, Naive Bayes, Logistic Regression, and Neural Network
- Hyperparameter Tuning — Grid Search over 24 combinations with 5-fold cross-validation
Data pipeline overview
What I Learned
- How to handle severe class imbalance using SMOTE combined with undersampling — not just oversampling blindly
- Why traditional embeddings (TF-IDF, BoW) can outperform Word2Vec on small, domain-specific datasets
- The tradeoff between model complexity and consistency: Neural Network had the highest accuracy, but Random Forest had the lowest variance (CV Std 0.0012)
- Hate speech vs. offensive language is genuinely hard to separate — the confusion matrix confirmed the model struggled most at that boundary
Results
Model performance comparison
| Model | Accuracy | CV Mean | CV Std |
|---|---|---|---|
| Neural Network | 95.89% | 96% | 0.0015 |
| Random Forest | 95.36% | 96% | 0.0012 |
| SVM | 92.99% | 93% | 0.0022 |
| Logistic Regression | 91.47% | 91% | 0.0024 |
| Naive Bayes | 86.13% | 86% | 0.0026 |
Neural Network achieved the highest accuracy at 95.89%, making it the best-performing model overall. Random Forest came in a close second — and actually had the lowest CV standard deviation (0.0012), meaning it was the most consistent model across folds. SVM was a solid middle-ground, reliable but outclassed by the ensemble approach. Logistic Regression and Naive Bayes lagged behind; Naive Bayes in particular struggled with its feature-independence assumption, which rarely holds for natural language.
The hardest boundary to learn was hate speech (class 0) vs. offensive language (class 1) — the confusion matrix showed these two classes cross-contaminating the most, which reflects how genuinely ambiguous the distinction is even for humans.