Thursday, August 17, 2017

Implementing a Toy Chatbot using Machine Learning

Chatbots are all the rage these days. There are numerous companies offering chatbots as a service (wit.ai, api.ai, etc.). To an outsider it may look like magic how these things work but for an ML practitioner they are nothing more than simple classifier models. About a year back I made an attempt to create a weather bot + travel bot (a bot which could tell you weather and also help you book flights). It was a fun learning experiment with some interesting output. While a year is a long enough time that I don't remember much about the code but in this post I will explain the general design of the bot that I created and some demos.

Essentially a chatbot is like a very simple REPL (Read-Eval-Print-Loop), where you read inputs from a human one sentence at a time, evaluate it and decide what to do with it, print a response, and go back to step 1. We will talk about all these 3 steps in detail below, in the context of implementing a weather + travel bot, i.e. a bot which tells you weather of a place and also helps you plan your travel.

For a weather bot, the most important thing is to be able to understand of which place you are asking the weather for. So, if we can simply train a model which is able to extract the location name from a sentence, we are good to go, right? Evidently, not quite so! Since these things are called chatbots (bots capable of chatting), expectations from them are greater.

It is not necessary that the first sentence the user enters is asking about the weather. It might just be a simple greeting, such as "Hi!", or "Hello". Our bot should be able to understand these and respond accordingly. Similarly, user may also try to make other sorts of conversations, such as asking the bot its name, or telling the bot their (user's) name. These are just two examples of the types of conversations which we might want our bot to be able to handle apart from the regular weather or travel questions.

So, in essence, we can't just expect that every sentence entered by the user is about weather. We need to first understand the sentence (i.e. greeting, asking name, or asking weather) and then generate a response. This means every input sentence has to go through a classifier, which classifies the sentence into one of the classes telling you what the sentence is about, e.g., is the user just greeting you, is the user asking a question, is the user saying something off topic, and finally is the user talking about weather or travel.  Based on this the bot can decide what response to generate.

I started with this design but I didn't have any training data to start with. To test the idea out, I just wrote some sample sentences about weather, travel, greetings, some questions (e.g. asking bot's name) in a text file. But I could only produce some 40 odd sentences overall, with 5-6 sentence of each individual sentence types that I wanted the bot to recognize. This was clearly a very small dataset to train any kind of machine learning model. Most of the models would end up badly overfitting it, resulting in a terribly confused bot.

So, I decided to simplify the problem. I created a hierarchy of sentence types. The first class was intent, which are the types of the sentence for which the bot is designed to respond (such as telling the weather, planning travel, answering user's greetings). The second class was non-intent (anything for which the bot was not designed to answer, but we could provide some hard coded responses if we understood what the user said). See the figure below to get an idea about the hierarchy of the sentence classes.

Hierarchy of sentence class types 
I gave a try to this second design, training individual classifiers for each of these different sentence classes. The way the whole thing works can be better expressed in the following pseudo code rather than any amount of prose I could write:


for each input sentence:
    if sentence not of type 'intent':
         if sentence of type 'question':
              question_class = question_classifier.predict(sentence)
              generate_question_response(question_class)
              return
         if sentence of type 'sentiment':
              sentiment = sentiment_classifier.predict(sentence)
                   if sentiment == 'happy':
                        generate_happy_response()
                   else:
                         generate_sad_response()
    else:
         intent = intent_classifier(sentence)
         if intent of type 'weather':
             if location not in sentence ask location else tell weather
         if intent of type 'travel':
             if location not in sentence ask location else get flights
         if intent of type 'goodbye':
             say_goodbye()
         if intent of type 'greeting':
              greet_user()


The above pseudo code describes the way the bot utilizes the various individually trained classifiers to navigate through the sentence types hierarchy and decides what actions to take. In my implementation I did not actually integrate with any 3rd part services to get weather or show flights. I just hard coded some fixed responses, but those could easily be replaced with actual integration with live services.

As far as training the models is concerned, I simply tried various models, such as logistic regression, support vector machines, random forests, neural networks etc. and used the one whichever showed best performance. Given the small size of the dataset, all of them were prone to overfitting, I just chose one which seemed to be least confused when tested on sentences of a different structure than the one in the training dataset.

To vectorize the sentences, I used word2vec (glove vectors using spacy). To vectorize a sentence, I simply converted all the words of a sentence to their word2vec vector, and summed those up to get a single vector. Many people also suggest averaging the vectors but I did not try that. Also, perhaps there are better ways to vectorize a sentence, such as stacking all the word vectors instead of summing them up.

I wrote this code more than a year back just as a proof of concept, so it is not super clean, commented or documented (this blog post is the best I got). The chatbot is present in the file bot.py. The code for training individual classifiers is in a bunch of ipython notebooks (I worked with those to easily experiment with various models but never got around moving that to actual Python files). I manually created dataset for individual sentence classes which are present in text files. Feel free to checkout the code at: https://github.com/abhinav-upadhyay/chatbot-poc. I have added pickle files of the pre-trained classifiers in the repo, so that bot.py just runs and you don't have to train classifiers yourself.

Following are some demo conversations (click them to enlarge and view):










   

Monday, October 10, 2016

Understanding Deep Learning as a Stack of Logistic Regression Models



So, I had an interesting self realization today. I sat down to implement a multi class classification system (the details of which shall remain classified). I was working with text data and there was no way to directly map it to one of the target classes. So I decided to build a series of classifiers, where starting by classifying at a more broad level, I will drill down towards more specific set of classes with each classifier. Essentially it was like a chain of UNIX shell pipes, you take the output of one classifier, feed to the next and so on. So for example, first I detect one of the more broad classes, then towards more specific ones, until I get to one of the leaf nodes of this tree of the classes.

After getting done, I realized, the deep layered neural networks in vogue these days, essentially do the same thing for you automatically. For example a deep convolutional network for face recognition first starts with detecting the edges in the starting layers, then moves on to detecting the contours and curves and then to more complex features. It all makes sense now :-D

Moral of the lesson, if you have a ton of data, just give it to a deep neural network and it will do all the feature engineering for you. And if you don't have enough data, then you need to do all the feature engineering by hand and build a stack of classifiers, like I had to do.