Contents

Apertium would like to have really good part-of-speech tagging, but in many cases falls below the state-of-the-art (around 97% tagging accuracy). This page intends to collect a comparison of tagging systems in Apertium and give some ideas of what could be done to improve them.

In the following two tables, values of the form x±y are the sample mean and standard deviation of the results of 10-fold cross validation.

In the following table the values represent tagger recall (= [true positives]/[total tokens]):

System

Language

Catalan

Spanish

Serbo-Croatian

Russian

Kazakh

Portuguese

Swedish

Italian

23,673

20,487

20,071

1,052

13,714

6,725

369

5,201

1st

86.50

90.34

44.99±1.20

38.19

72.08

76.70

34.70

82.28±3.05

Bigram (unsup, 0 iters)

88.96±1.12

88.49±1.54

47.31±1.24

81.41±5.78

79.16±3.12

Bigram (unsup, 50 iters)

91.74±1.15

91.13±1.52

48.28±1.33

81.09±5.99

84.93±2.71

Bigram (unsup, 250 iters)

91.51±1.16

90.85±1.48

48.05±1.47

80.31±6.60

84.52±2.78

Lwsw (0 iters)

92.73±0.89

92.86±0.95

43.56±1.20

83.01±5.47

86.12±2.96

Lwsw (50 iters)

92.98±0.85

93.01±1.02

45.09±1.15

82.70±5.76

86.07±2.68

Lwsw (250 iters)

92.99±0.84

93.06±1.02

45.13±1.17

82.75±5.79

86.08±2.67

CG→1st

88.05

91.10

64.01±1.04

39.81

81.56

87.99

42.90

83.29±3.07

CG→Bigram (unsup, 0 iters)

91.83±1.03

91.39±1.42

60.37±1.45

86.77±6.33

81.31±3.10

CG→Bigram (unsup, 50 iters)

93.16±1.39

92.53±1.29

60.91±1.65

87.48±6.16

86.11±2.46

CG→Bigram (unsup, 250 iters)

92.99±1.38

92.50±1.23

60.88±1.66

87.20±6.72

86.01±2.59

CG→Lwsw (0 iters)

93.17±1.08

92.72±1.09

59.93±1.46

86.60±6.20

85.64±2.83

CG→Lwsw (50 iters)

93.37±1.02

92.74±1.16

60.38±1.57

86.54±6.21

85.55±2.72

CG→Lwsw (250 iters)

93.38±1.05

92.77±1.18

60.42±1.53

86.54±6.20

85.54±2.72

Unigram model 1

93.86±1.13

93.96±0.98

63.96±0.92

39.11±8.91

80.63±3.87

86.00±6.63

46.48±5.78

89.37±1.63

Unigram model 2

93.90±1.09

93.69±0.94

67.51±0.67

40.36±8.59

82.19±3.70

87.13±6.23

47.12±8.29

89.23±0.97

Unigram model 3

93.88±1.08

93.67±0.94

67.47±0.64

40.36±8.59

82.45±3.80

87.11±6.13

47.12±8.29

89.00±0.95

Bigram (sup)

96.00±0.87

95.47±1.07

55.26±0.87

88.07±6.50

CG→Unigram model 1

94.34±1.11

94.73±0.88

68.42±0.69

40.71±9.39

84.54±3.29

88.42±6.55

46.84±5.48

89.04±1.45

CG→Unigram model 2

94.11±1.09

94.33±0.82

68.93±0.72

41.43±9.21

84.62±3.47

88.64±6.13

47.07±7.39

88.67±0.93

CG→Unigram model 3

94.09±1.08

94.31±0.81

68.88±0.72

41.43±9.21

84.71±3.54

88.63±6.07

47.07±7.39

88.45±0.94

CG→Bigram (sup)

96.00±1.13

94.88±1.18

65.66±1.16

88.73±6.36

Percep (coarsebigram)

94.02±1.26

94.79±0.86

55.64±1.17

87.04±6.23

90.87±0.87

Percep (kaztags)

93.66±0.76

94.28±0.93

70.44±0.92

91.41±2.09

87.07±6.16

99.70±0.96

90.64±1.13

Percep (spacycoarsetags)

95.06±1.01

95.23±0.66

56.34±1.21

87.32±6.22

90.96±0.76

Percep (spacyflattags)

95.25±0.85

95.46±0.64

73.02±1.12

91.91±2.13

87.45±6.24

99.70±0.96

90.13±1.37

Percep (unigram)

93.59±0.77

94.09±0.96

70.11±0.97

91.08±2.13

87.16±6.22

99.70±0.96

90.23±0.95

CG→Percep (coarsebigram)

94.01±1.28

94.75±0.69

67.32±0.96

88.70±6.29

89.25±1.17

CG→Percep (kaztags)

93.91±0.90

94.72±0.88

72.79±1.11

87.73±3.12

88.72±6.23

94.34±3.16

89.82±1.29

CG→Percep (spacycoarsetags)

94.93±1.12

95.16±0.78

67.81±1.11

88.83±6.13

89.88±1.03

CG→Percep (spacyflattags)

95.19±0.98

95.40±0.66

72.80±0.76

87.62±2.83

88.85±6.21

94.34±3.16

89.34±1.24

CG→Percep (unigram)

93.87±0.92

94.73±0.77

72.42±0.86

87.52±3.09

88.81±6.28

94.34±3.16

89.39±1.24

In the following table the values represent availability adjusted tagger recall (= [true positives]/[words with a correct analysis from the morphological parser]). This data is also available in box plot form here:

System

Language

Catalan

Spanish

Serbo-Croatian

Russian

Kazakh

Portuguese

Swedish

Italian

23,673

20,487

20,071

1,052

13,714

6,725

369

5,201

1st

87.86

91.82

52.56±1.53

75.93

77.72

83.00

64.47

82.77±3.09

Bigram (unsup, 0 iters)

90.35±1.17

89.95±1.45

55.27±1.63

89.72±2.06

79.64±3.11

Bigram (unsup, 50 iters)

93.17±1.21

92.63±1.40

56.40±1.70

89.35±1.99

85.45±2.78

Bigram (unsup, 250 iters)

92.94±1.22

92.35±1.33

56.13±1.87

88.45±2.51

85.03±2.87

Lwsw (0 iters)

94.18±0.91

94.40±0.77

50.88±1.54

91.51±1.22

86.64±3.15

Lwsw (50 iters)

94.44±0.81

94.54±0.83

52.67±1.46

91.14±1.62

86.59±2.82

Lwsw (250 iters)

94.44±0.79

94.60±0.84

52.72±1.50

91.20±1.64

86.60±2.81

CG→1st

89.44

92.60

74.77±1.32

79.10

87.95

95.22

79.70

83.79±3.08

CG→Bigram (unsup, 0 iters)

93.27±1.10

92.90±1.30

70.52±1.71

95.61±1.77

81.80±3.08

CG→Bigram (unsup, 50 iters)

94.62±1.49

94.05±1.13

71.15±1.94

96.41±1.38

86.63±2.51

CG→Bigram (unsup, 250 iters)

94.45±1.48

94.03±1.09

71.11±1.95

96.06±2.05

86.53±2.62

CG→Lwsw (0 iters)

94.63±1.08

94.25±0.91

70.00±1.74

95.43±1.52

86.16±2.97

CG→Lwsw (50 iters)

94.83±1.01

94.27±0.97

70.53±1.86

95.36±1.54

86.07±2.79

CG→Lwsw (250 iters)

94.84±1.03

94.30±0.99

70.58±1.81

95.36±1.53

86.06±2.79

Unigram model 1

95.33±1.05

95.51±0.84

74.72±1.43

77.54±6.51

87.03±3.03

94.74±2.44

89.26±7.32

89.91±1.93

Unigram model 2

95.37±1.04

95.23±0.77

78.87±1.05

80.06±6.11

88.72±2.76

96.01±1.70

89.82±7.70

89.77±1.23

Unigram model 3

95.35±1.03

95.22±0.79

78.82±1.06

80.06±6.11

88.99±2.83

95.99±1.52

89.82±7.70

89.54±1.25

Bigram (sup)

97.50±0.93

97.04±0.86

64.55±1.33

97.03±1.75

CG→Unigram model 1

95.82±1.06

96.30±0.68

79.92±0.95

80.56±6.70

91.25±2.01

97.42±1.76

90.00±6.99

89.58±1.75

CG→Unigram model 2

95.58±1.07

95.89±0.59

80.51±0.95

82.06±6.50

91.33±2.15

97.70±1.32

89.97±7.50

89.21±1.13

CG→Unigram model 3

95.56±1.05

95.86±0.60

80.46±0.99

82.06±6.50

91.43±2.26

97.69±1.28

89.97±7.50

88.98±1.18

CG→Bigram (sup)

97.51±1.21

96.45±0.93

76.70±1.46

97.78±1.52

Percep (coarsebigram)

95.71±1.36

96.60±0.75

61.99±1.24

95.92±1.60

92.89±1.10

Percep (kaztags)

95.34±0.77

96.08±0.69

78.47±0.99

91.41±2.08

95.95±1.69

99.70±0.96

92.67±1.31

Percep (spacycoarsetags)

96.76±1.06

97.05±0.56

62.77±1.29

96.22±1.52

92.99±0.93

Percep (spacyflattags)

96.96±0.87

97.28±0.58

81.35±1.19

91.92±2.12

96.37±1.53

99.70±0.96

92.14±1.44

Percep (unigram)

95.27±0.76

95.89±0.74

78.11±1.03

91.08±2.12

96.05±1.64

99.70±0.96

92.24±1.11

CG→Percep (coarsebigram)

95.70±1.37

96.55±0.55

75.00±1.04

97.75±1.47

91.25±1.50

CG→Percep (kaztags)

95.59±0.92

96.53±0.66

81.10±1.20

87.74±3.11

97.78±1.41

94.34±3.16

91.83±1.50

CG→Percep (spacycoarsetags)

96.64±1.17

96.98±0.64

75.54±1.31

97.90±1.30

91.89±1.20

CG→Percep (spacyflattags)

96.90±1.02

97.22±0.51

81.10±0.86

87.62±2.82

97.92±1.38

94.34±3.16

91.34±1.42

CG→Percep (unigram)

95.55±0.92

96.54±0.52

80.68±0.93

87.52±3.08

97.87±1.47

94.34±3.16

91.38±1.40

In the following table, the intervals represent the [low, high] values from 10-fold cross validation.