Bitext Blog

Speed Up Your Bot Training with Artificial Data

If you want your chatbot to recognize a specific intent, you need to provide it with a large number of sentences that express that intent. That doesn’t seem that easy, right? Now you can do it in a matter of seconds. Keep reading!

One of the main problems with the current generation of chatbots is the that they require large amounts of training data required to build them.Until now, specific intents like "find a summer dress" and their variations such as “can you find a dress for summer?” or “I need you to find dresses for summer season” had to be generated manually. One or more people had to be writing many different sentences for each intent, vertical and language. This was a time-consuming task rather than a creative one, and it made bot development processes very costly.

At Bitext, we solved this problem with our own Artificial Data Generation technology which automatically generates many different sentences with the same meaning as the original, in order to automate the most resource-intensive part of a bot creation process.

Natural Language Generation Process

The NLG process receives the information output by the previous NLU analysis and generates a number of sentences which the same meaning. These sentences can vary in different aspects:

Word order: “summer dress” can be changed to “dress for the summer"

Singular/plural: “summer dress” can be changed to “summer dresses”

Questions: “find a summer dress” can be changed to “can you find a summer dress?”

Negation: “find a dress” can be changed to “find a dress, but not for summer”

Politeness: “find a summer dress” can be changed to “can you please find a summer dress?”

Moreover, you can even ask to generate only some of these variants: for example, just “word order” and “questions”, but no “negation” or just the least complex sentences (10-20) or the whole range of them (which can be hundreds of sentences), depending on your needs or the capacity of your chatbot. The result is a powerful but flexible system that can be configured to generate as many, or as few, variants of a sentence as your system needs.

This approach provides a human-in-the-loop solution that automates the most resource-intensive part of the bot creation process, allowing humans to focus on providing domain expertise rather than on repetitive manual work. Besides, this service increases bot accuracy significantly, around 2x improvement.

What’s more, the data generated is private and independent of the learning platform, which avoids conflating the training and machine learning processes. These aspects will increase portability, reduce vendor lock-in and provide you with control over your own data.

As Bitext NLG supports many different platforms’ output, you can generate your training data for all different platforms at once, thus giving you the power to quickly switch from one platform to another one, or even maintain several chatbots at a time, all of them with the same training data, and see what behaves better for you. Ownership of the training data also ensures that it can be reused at any point beyond an initial POC or engagement, providing complete control over the training process and allowing customers to select the machine learning vendor best suited to their needs.