I used newspaper articles in my experiments until now. I decided to use texts which extracted from other platforms, so I collected texts from eksisozluk platform. Ekşisözlük is a kind of local Reddit. I tried to perform a comparison experiment by using Turkish gerunds as features.
Here my experiment components:

Corpus: Eksisozluk dataset of 5 authors represented by nicknames, 100 texts for each author. Average word count is 461, 80% of the dataset is used as training data and 20% of the dataset is used as test data.
Features: Features are Turkish gerunds. These words are derived from the verbs but used as nouns, adjectives, and adverbs in a sentence. I listed the most widely used verbs in Turkish, after that I derived gerunds by using gerund suffixes. Finally, I obtained 590 verbal nouns, 587 verbal adjectives and 916 verbal adverbs (with proper vowel versions).
Algorithms: Algorithms are LinearSVM, Multi-Layer Perceptron (MLP), Naive Bayes (NB), k-Nearest Neighbor (kNN) and Decision Tree.

Now, the results are below.

SVM

The performance of SVM with gerund frequencies as features is not satisfied, it classified just 3 of 5 authors with correct matching minimum 12 of 20 test documents.

MLP

The performance of MLP with gerund features is slightly better than SVM. For example, it classified 4 of 5 authors with correct matching minimum 12 of 20 test documents.

NB

The performance of NB is average and close to other results. For example, it classified 3 of 5 authors with correct matching minimum 12 of 20 test documents.

Decision Tree

The performance of Decision tree is not enough, average F1-score is 0.39. It did not make satisfied correct matching.

kNN

The performance of kNN not enough but slightly better than decision tree, average F1-score is 0.44. It classified only one of 5 authors with correct matching 16 of 20 test documents.

As a result, NB, kNN and decision tree are not suitable algorithms for this approach. SVM and MLP performed better than other algorithms.

Share this:

Like this:

Gerunds are derived from the verbs but used as nouns in a sentence. Gerunds are created by adding derivational suffixes to verbs in Turkish language. According to derivational suffix, the gerunds can be used as nouns, adjectives or adverbs in the sentence.

Turkish is convenient to derive gerunds because the language has many gerunds suffixes. Starting from this point, I listed the most widely used verbs in Turkish, after that I derived gerunds by using gerund suffixes. Finally, I obtained 590 verbal nouns, 587 verbal adjectives and 916 verbal adverbs (with proper vowel versions).

I implemented some functions that processing the gerunds as features for the classification method. I used these functions via SVM on Radikal dataset. The program produced 2662 features on Radikal dataset.

Here my first results are;

Precision

Recall

F1-Score

AH

0.87

0.67

0.76

AO

0.76

0.76

0.76

BO

0.72

0.78

0.75

EB

0.66

0.70

0.68

FT

0.79

0.84

0.82

OC

0.71

0.80

0.75

TE

0.83

0.76

0.79

AVG.

0.76

0.76

0.76

According to the first practice implementation of gerunds gives F1-score between 0.68 and 0.82. The first results are compared with reviewed Turkish studies, we can say that these results are promising. Because, the average F1-score is 0.76 and it was resulted from only gerunds frequency.