A Walk-through Example of modeling Knowledge Tracing with BNT-SM

Knowledge Tracing (KT) is an established technique for student modeling and
was first used in the ACT Programming Languages Tutor. The goal of KT is to
estimate the student knowledge from his or her observed actions. At each
successive opportunity to apply a skill, KT updates its estimated probability
that the student knows the skill, based on the skill-specific learning and
performance parameters and the observed student performance (evidence). Reye
showed that KT is special case of a DBN which assumes parameters do not
change across time slices. More specifically, the conditional independence graph
of KT can be drawn as the following Figure. (We rename the original KT variables L0 and t as
already know and learn for consistent naming with other sections.)

In the following sections, we will briefly go over the format of the input and
output files. There are additional details in the file README.txt at the top level directory in the package.

evidence.xls

Tab delimited text file. This is very important: it is not an Excel file

Two fields are essential: user and skill

BNT-SM will train a Bayes net for each skill, lumping data from all
users. That is, BNT-SM train skill-specific models.

Evidence data should be comprehensive. That is, there shouldn't be
missing data point.

Hidden variables and missing observations should be marked with
NULL

For discrete variables, the values cannot be 0 because Matlab use it as
array subscript (starting with 1). Therefore, we often increment the
discrete variables by 1. In the case of a binary variable, 1 would be used
for 0 or false values, wheres 2 would be used for 1 or true values.

user

machine_name

utterance_start_time

utterance_sms

target_word_number

skill

help

knowledge

correct

transcript_key

trn_correct

asr_accept

confidence_score

asr_confidence

fBS7-7-1990-02-03

LISTEN01-308-04

2000-05-12 17:52:27

640

8

WORLD

1

NULL

NULL

NULL

NULL

2

0.0668763

1

fDL7-5-1993-11-28

LISTEN01-334-04

2004-10-13 13:56:49

421

13

WORLD

2

NULL

NULL

NULL

NULL

2

0.0430581

1

param_table.xls

Since BNT-SM estimates skill-specific models, a Bayes net is trained for
each skill in the training dataset.

BNT-SM outputs parameters that are specifified in the property_xml
file

skill

num_users

num_cases

ll

L1

guess

slip

t

forget

skill_HELLO

14

23

-3.609297

0.744855

0.721432

0.000005

0.982517

0.000001

skill_WORLD

46

218

-90.177505

0.695366

0.634124

0.113612

0.256071

0.000001

inference_result.xls

The format of the inference_result.xls is identical to that of
evidence.xls, except that BNT-SM performs inferences on the hidden variables
and estimate their values.

If instead, you want to inference prior probability (before observing
the evidence. For instance, in classic Knowledge Tracing), you can make
a switch in RunBnet.m when calling inference_bnet.m

user

machine_name

utterance_start_time

utterance_sms

target_word_number

skill

help

knowledge

correct

transcript_key

trn_correct

asr_accept

confidence_score

asr_confidence

fCA8-5-1994-06-27

LISTEN01-302-04

2004-11-09 11:25:24

218

1

HELLO

1

0.801846

NULL

NULL

NULL

2

0.124751

1

fCA7-5-1994-06-27

LISTEN01-315-04

2005-01-24 11:33:12

468

1

HELLO

1

0.997498

NULL

NULL

NULL

2

0.0785152

1

Additional Output

For efficiency/debug purposes, we have also output the following files for you
convenience:

log.txt - Helps finding out which skill the model is training/running
on. The name of this file is definable in the property.xml file. Matlab does not update the screen often when running intensively on a large data
set.

hash_bnet.mat - an object that stores all the trained Bnets in Matlab
binaries.

evidence.mat - an object that stores the processed data set in Matlab
binaries. Notice, as a preprocessing step, BNT-SM first constructs a
structure out of the tab-delimited evidence file. When the data gets large,
it can take quite a while for Matlab to load. Therefore, the loaded
structure is saved as Matlab binaries for fast loading.

Project Listen

Training a Student Model to be used in the Reading Tutor

Specify the Bayes nets in the property_xml file.

Provide the data.

Execute RunBnet.m

Retrieve the model parameters from the param_table.

Derive the equation for student models.

Inference pknow and pcorrect using the equations derived by hand.

Verify the correctness of your derivation by comparing the pknow and
pcorrect you derived to what BNT-SM estimated in inference_result.

Training the Out of Vocabulary (OOV) model

Make the observation that this task can be done by pre-processing the
evidence.xls file

That is, sort the evidence.xls file by number of cases in each skill.

Then, for those skills that have number of cases below certain threshold
(for example, less than 10 cases or 5 students), replace the skill name with
"OOV"