Project Description

The project is a language model toolkit based on .NET framework. So far, the project supports two language model algorithms. The one is n-gram language modeling with Kneser-Ney smoothing and the other is recurrent neural network language modeling ported
from RNNLM by Tomas Mikolov.

By this project, users are able to train language model by pipeline tool and predict sentence's probability by decoder.

<input file> : input file with text which will be processed by language model

<output file> : output file with text processed by language model

Example:

lm_score.exe wordbreak_dict.txt chsLM.txt 4 input.txt output.txt

The format of <output file> as follows:

Text \t Probability \t the number of OOV \t Perplexity

API for developers

The language model has provided some APIs for developers to use the model in their projects. The following paragraph introduces how to use APIs.

1. Add LMDecoder.dll as reference into project

2. Create LMDecoder.LMDecoder instance

3. Use LoadLM(string strFileName) to load language model from given file. The
strFileName is used to specify the language model path and file name.

4. Use LMResult GetSentProb(string strText, int order) to predict a specific string's score. The
strText is the string used to predict score and the order is the max-order. The return value type is
LMResult.

LMResult contains predicted result. Its structure as follows:
public class LMResult
{
public double logProb; //the probability score of given string
public int oovs; //the number of OOV tokens
public double perplexity; //the perplexity of given string
}