Implementation of NLP models for classification of the Persian comments' sentiment in Python.
see requirements.txt and use pip install -r requirements.txt for installation.
numpyfor some calculationspandasfor data read/writescikit-learnfor classfiersgensimfor Word2Vec model
Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine [Wiki]. Following figure shows the workflow of sentiment analysis [monkeylearn].
-
This directory contains data used in the project.
train.csvhas labeled comments for training (we split it into train-validation to find the best model).test.csvhas unlabeled comments that we should predict the label of each row using our propused model and save the result intest-labeled.csv. -
This directory contains some files for being used in NLP models (e.g. stop-words).
-
Contains steps for finding the best method for being applied to data. In this file, we only use
train.csvand split it into train-validation sets. Here is the final result of all 18 models employed in this phase:
| Classifiers | TF-IDF | Word2Vec |
|---|---|---|
| KNN(n=4) | 0.4619 | 0.6131 |
| KNN(n=8) | 0.6940 | 0.6298 |
| KNN(n=16) | 0.7524 | 0.6238 |
| SVM(linear) | 0.7905 | 0.4774 |
| SVM(poly) | 0.6417 | 0.6452 |
| SVM(rbf) | 0.7905 | 0.6810 |
| XGB(n=50) | 0.7214 | 0.6571 |
| XGB(n=100) | 0.7298 | 0.6786 |
| XGB(n=150) | 0.7381 | 0.6714 |
-
This file is the implementation of the proposed method found in phase1. The result of applying the proposed model on
test.csvcan be seen intest-labeled.csv.
