Мигнуть / подмигнуть usage

Our case

Our case is to study changing the sphere of usage verbs to ‘mignut’ (to blink) and ‘podmignut’(to wink). Material for analysis we found at the Corpus of texts of the XIX century, collected in the framework of the “Taman Today” project, namely: the marked novel by M.Yu. Lermontov “The hero of Our Time”, 1840

In the chapter of the novel “Hero of Our Time” “Princess Mary” we found the following examples

Our data

We used data from the National Corpora of Russian language. For the verb ‘mignut’, we found 877 occurrences in the Corpora, 35% were chosen by random sampling. The received entries were classified by the values: 1-‘to blink’, 2 –‘to sign’, 3 – ‘to twinkle’. There are 120 occurrences of considering meaning ‘to sign’ (2).

For the verb ‘podmignut’, we found 2372 occurrences in the Corpora, 35% were chosen by random sampling and we got 830 occurrences for further analysis.

Hypothesis

The null hypothesis: Verbs with prefixes appear later, displacing no prefixes equivalents.The alternative hypothesis: There is no correlation between these two verbs.

Taking first look on these diagrams we can observe that our hypothesis actually tends to be true. We should take into account that we have much less texts in 1850’s than now. So, it’s quite logical that we have such low values in that period. On the other hand, the peak of ‘мигнуть’ usage is in 1920’s, but the peak of ‘подмигнуть’ usage is in 1990’s. This fact lets us say that ‘подмигнуть’ tends to replace the ‘мигнуть’.

But these are just some emprircal conclusions. Next thing that we did is applying some statistical tests to our data

Preprocessing the data

Before we start applying the statistical tests, we have to preprocess our data, as we mentioned. For adequate results of statistical tests, it is necessary to compare comparable data sets, in our particular case it is necessary to select a time interval beginning with one date for both verbs ‘mignut’ and ‘podmignut’.

Applying statistical methods

Data can be either ranged or distributed. The distributed data refers to random values from some continuous sets. Ranged data refers to some categories. Distributed values use t-test for hypothesis demonstration, but ranged ones use chi-squared tests.

We have two distributed values, that’s why we have used paired t-test:

According to this diagrams we can say that count of text in last time has been increased, that’s why count of ‘mignut’ increases too. But in the case of ‘podmignut’ there’s another thing. Its number increases constantly.

The most important thing here is that the mignut series forecast alpha parameter is closer to 0 than podmignut one. This means that mignut prediction is mostly based on the late time period (when number of texts is bigger).

Results

As a result, we have received confirmation of our hypothesis, however, the database for an ideal statistical analysis is not enough, but for existing occurrences we have observed some significant correlation.