Running Test Suites

The following steps will guide you on how to run a batch test on your bot and obtain a detailed analytical report on the utterances based on the test results. To get started, click Batch Testing in the Testing section on the builder.

Note: Prior to testing, it is essential to add and train your bot with a considerable number of utterances using Machine Learning.

To run a Test Suite, for example, the Developer Defined utterances, click Developer Defined Utterances followed by Run Test Suite. This will initiate the batch test for Developer defined utterances.

The test will display the results as explained below. Each test run will create a test report record and displays a summary of the test result. The batch test result in the screenshot below includes the following information:

Last Run Date that displays the Date and time of the latest test run.

F1 Score is the weighted average of Precision and Recall.

Precision that is the number of correctly classified utterances divided by the total number of utterances that got classified (correctly or incorrectly) to any existing task.

Recall that is the number of correctly classified utterances divided by the total number of utterances that got classified correctly to any existing task or classified incorrectly as an absence of existing tasks.

Intent Success % that displays the percentage of correct intent recognition that has resulted from the test.

Entity Success % that displays the percentage of correct entities recognition that has resulted from the test.

There are three possible outcomes from each test run:

Success – when all records are present in the file are processed

Success with a warning – when one or more records present in the suite are discarded from detection due to system error

Failed – when there were a system error and test could not be resumed post recovery.

Hovering over the warning/error icon will display a message suggesting the reason.

To get a detailed analysis of the test run, click the Download icon to download the test report in CSV format. The top section of the report comprises the summary with the following fields:

Last Tested: Date of the latest test run for developer-defined utterances.

Utterance Count: Total number of utterances included in the test run.

Success/Failure Ratio: Total number of successfully predicted utterances divided by the total count of utterances multiplied by 100.

True Negative (TN): Percentage of utterances that were not expected to match any intent and they did not match

False Positive (FP): Percentage of utterances that have matched an unexpected intent.

False Negative (FN): Percentage of utterances that have not matched expected intent.

The report also provides detailed information on each of the test utterances and the corresponding results.

Utterances – Utterances used in the corresponding test suite.

Expected Intent – The intent expected to match for a given utterance

Matched Intent – The intent that is matched for an utterance during the batch test.

Parent Intent – The parent intent considered for matching an utterance against an intent.

Task State – The status of the intent or task against which the intent is identified. Possible values include Configured or Published

Result Type – Result categorized as True Positive or True Negative or False Positive or False Negative

Entity Name – The name of the entity detected from the utterance.

Expected EntityValue – The entity value expected to be determined during the batch test.

Matched EntityValue – The entity value identified from an utterance.

Entity Result – Result categorized as True or False to indicate whether the expected entity value is the same as the actual entity value.

Matched Intent’s Score – For False Positives and False Negatives, the confidence scores from FM, ML and/or KG engines are displayed for the matched intent from the utterance. Note that the scores are given only if the engine detects the intent, that means that you may not see the scores from all the three engines at all times.

Expected Intent’s Score – For False Positives, the confidence scores for the intent expected to match for the given utterance is given. Again the score will be given by the engines detecting the intent.

Tip: For any of the batch tests, if results indicate that your bot is unable to recognize the correct intents, you can work on improving its performance by adding or modifying utterances to Machine Learning model.