Thursday, August 17, 2017

Implementing a Toy Chatbot using Machine Learning

Chatbots are all the rage these days. There are numerous companies offering chatbots as a service (,, etc.). To an outsider it may look like magic how these things work but for an ML practitioner they are nothing more than simple classifier models. About a year back I made an attempt to create a weather bot + travel bot (a bot which could tell you weather and also help you book flights). It was a fun learning experiment with some interesting output. While a year is a long enough time that I don't remember much about the code but in this post I will explain the general design of the bot that I created and some demos.

Essentially a chatbot is like a very simple REPL (Read-Eval-Print-Loop), where you read inputs from a human one sentence at a time, evaluate it and decide what to do with it, print a response, and go back to step 1. We will talk about all these 3 steps in detail below, in the context of implementing a weather + travel bot, i.e. a bot which tells you weather of a place and also helps you plan your travel.

For a weather bot, the most important thing is to be able to understand of which place you are asking the weather for. So, if we can simply train a model which is able to extract the location name from a sentence, we are good to go, right? Evidently, not quite so! Since these things are called chatbots (bots capable of chatting), expectations from them are greater.

It is not necessary that the first sentence the user enters is asking about the weather. It might just be a simple greeting, such as "Hi!", or "Hello". Our bot should be able to understand these and respond accordingly. Similarly, user may also try to make other sorts of conversations, such as asking the bot its name, or telling the bot their (user's) name. These are just two examples of the types of conversations which we might want our bot to be able to handle apart from the regular weather or travel questions.

So, in essence, we can't just expect that every sentence entered by the user is about weather. We need to first understand the sentence (i.e. greeting, asking name, or asking weather) and then generate a response. This means every input sentence has to go through a classifier, which classifies the sentence into one of the classes telling you what the sentence is about, e.g., is the user just greeting you, is the user asking a question, is the user saying something off topic, and finally is the user talking about weather or travel.  Based on this the bot can decide what response to generate.

I started with this design but I didn't have any training data to start with. To test the idea out, I just wrote some sample sentences about weather, travel, greetings, some questions (e.g. asking bot's name) in a text file. But I could only produce some 40 odd sentences overall, with 5-6 sentence of each individual sentence types that I wanted the bot to recognize. This was clearly a very small dataset to train any kind of machine learning model. Most of the models would end up badly overfitting it, resulting in a terribly confused bot.

So, I decided to simplify the problem. I created a hierarchy of sentence types. The first class was intent, which are the types of the sentence for which the bot is designed to respond (such as telling the weather, planning travel, answering user's greetings). The second class was non-intent (anything for which the bot was not designed to answer, but we could provide some hard coded responses if we understood what the user said). See the figure below to get an idea about the hierarchy of the sentence classes.

Hierarchy of sentence class types 
I gave a try to this second design, training individual classifiers for each of these different sentence classes. The way the whole thing works can be better expressed in the following pseudo code rather than any amount of prose I could write:

for each input sentence:
    if sentence not of type 'intent':
         if sentence of type 'question':
              question_class = question_classifier.predict(sentence)
         if sentence of type 'sentiment':
              sentiment = sentiment_classifier.predict(sentence)
                   if sentiment == 'happy':
         intent = intent_classifier(sentence)
         if intent of type 'weather':
             if location not in sentence ask location else tell weather
         if intent of type 'travel':
             if location not in sentence ask location else get flights
         if intent of type 'goodbye':
         if intent of type 'greeting':

The above pseudo code describes the way the bot utilizes the various individually trained classifiers to navigate through the sentence types hierarchy and decides what actions to take. In my implementation I did not actually integrate with any 3rd part services to get weather or show flights. I just hard coded some fixed responses, but those could easily be replaced with actual integration with live services.

As far as training the models is concerned, I simply tried various models, such as logistic regression, support vector machines, random forests, neural networks etc. and used the one whichever showed best performance. Given the small size of the dataset, all of them were prone to overfitting, I just chose one which seemed to be least confused when tested on sentences of a different structure than the one in the training dataset.

To vectorize the sentences, I used word2vec (glove vectors using spacy). To vectorize a sentence, I simply converted all the words of a sentence to their word2vec vector, and summed those up to get a single vector. Many people also suggest averaging the vectors but I did not try that. Also, perhaps there are better ways to vectorize a sentence, such as stacking all the word vectors instead of summing them up.

I wrote this code more than a year back just as a proof of concept, so it is not super clean, commented or documented (this blog post is the best I got). The chatbot is present in the file The code for training individual classifiers is in a bunch of ipython notebooks (I worked with those to easily experiment with various models but never got around moving that to actual Python files). I manually created dataset for individual sentence classes which are present in text files. Feel free to checkout the code at: I have added pickle files of the pre-trained classifiers in the repo, so that just runs and you don't have to train classifiers yourself.

Following are some demo conversations (click them to enlarge and view):