Building Nelly, my DIY voice assistant for Android

Few things in the mobile technology space has been as rewarding and useful as the time I spent learning how to use Tasker. My phone is so different now than it was a month ago, in a good way. From barely being able to make the speakerphone turn on while the phone is flat on a table during a call, I’ve trial and errored my way to where I have a sleep mode that can even turn off my PC monitors remotely. With the ability to make basically anything I want in Tasker, the big question was what to make.

I’ve been wishing for a good voice assistant for a while, but have been disappointed at the accuracy, capability, and overall design of the ones available on Android – and iOS for that matter, Siri hasn’t exactly impressed me.

The name “Nelly”, and the choice of voice for her

Voice assistants often have personalities, so why should mine be any different? Nelly is a reference that I doubt anyone will pick up on without an explanation, because it’s a reference to a science fiction book series that is unfortunately not as popular as Harry Potter. Written by Mike Shepherd (Mike Moscoe), the series of books about the space heroine Kris Longknife features a range of original characters, where one of my favorites is Nelly, the main character’s personal computer. Featuring the best hardware money can buy, Nelly’s abilities are far beyond even many full sized computers in the Kris Longknife universe. Later in the series she even has kids, which happens by borrowing her owner’s credit card to go shopping for parts to build more of herself. Point being, it’s an awesome personal computer, just what I want my Nelly to be.

As for the voice, well, I started the book series back when I consumed audio books left and right while working, and have continued on with the series in audio format ever since. The narrator, Dina Pearlman, has a voice she uses for Nelly that is very consistent throughout the books. I wish I could say it was on purpose, but the truth is that the similarity of that voice to IVONA’s Amy British English TTS (Text To Speech) engine (which I had already installed before thinking of my Nelly) is entirely coincidental. That doesn’t mean I get any less satisfaction out of Nelly sounding like Nelly of course. IVONA has some awesome TTS engines, and I have to say that its Amy engine puts anything else I’ve tried to shame. Siri sounds like a robot in comparison, and that’s with Amy still being in beta!

How Nelly works

Tasker is able to tap into Android’s speech recognition system through its Get Voice action. This pops up a voice input window to indicate it listening, and then stores the converted text in a variable named %VOICE. If you say hello world, %VOICE‘s value will be hello world. Pretty much all of Tasker’s other actions can use an If condition, which means that they will only run if a certain condition is fulfilled. These conditions use variables and various types of math to control whether an action can run.

For Nelly, most of her responses are very simple: Use Tasker’s Say action and the IVONA Amy TTS engine to speak a specified text If %VOICE matches *trigger*. The trigger depends on what I want it to react to, and the asterixis are there when the term can be part of a much longer phrase. For instance, the trigger for Nelly’s answer to what the best smartphone is uses a very simple trigger: *best smartphone*. While this means that in theory it would be fooled by asking “what is not the best smartphone”, keeping things simple also means that it’s less picky about what you ask it. Sup dawg, so, yeah, um, I was kinda just wondering, you know, what be the best smartphone would work just as well as what is the best smartphone.

For a voice assistant that is custom made for a single person, me, this is definitely the way to go. I know the triggers, which means that I know how to use them correctly. I can however make them as specific or non-specific as I want, should there ever be a problem.

Using the If system, Nelly goes through a list of potential actions and checks if they apply for the current situation based on what I just told it. Since it won’t do anything if it can’t find any matches, I naturally won’t have to cover every single possibility out there.

Nelly’s features

This is a list of all the features – or responses if you will – that Nelly has. Anything you see here can be triggered with voice input by pressing the invisible Nelly activation icon I have on my home screen, and the voice input box will pop up as an overlay to the home screen – so no need to enter any apps.

Best smartphone

Already covered this one, but basically, the trigger is best smartphone and that trigger is used in two other actions: a Say action to give the response, and a Browse Url action that opens Pocketables once the spoken response finishes.

Google search

In order to use Nelly to search Google, I made a separate task called Nelly search. This task contains four actions, and what’s special about them is that none of them have any If triggers. Instead, the main Nelly task has a Perform Task action that is tied to a *find something* trigger. This means that if I tell Nelly anything that contains find something, it then starts the separate task. The reason for this is to tie several independent actions to a single trigger, as well as be able to overwrite %VOICE without that affecting the following tasks’ ability to trigger.

The first action in the Nelly search task is a normal Say, with Sure, what do you need as the spoken text. Next you have a new Get Voice action, which then overwrites the original %VOICE that was created when I first asked Nelly to find something. Then there’s a new Say, this time with Here you go. If this doesn’t do it for you, blame google. Finally, there’s a Browse Url action with http://www.google.com/search?q=%VOICE as the URL.

What happens in practice when I ask Nelly to find something is that she replies Sure, what do you need, then records a response, replies Here you go. If this doesn’t do it for you, blame Google, and then does a Google search for whatever I told her to search for. This will bring up a normal Google search results page in the browser.

Sleep Mode on/off

My Sleep Mode Tasker profile is a rather complicated set of actions in itself, and this enables me to enable and disable Sleep Mode using Nelly. The triggers are *night* and *morning*,which will use the Set Variable action to set %Sleepmode to on or off respectively. These triggers are rather generic, but I’m unlikely to use them elsewhere (if so I can change things around), and keeping them simple means that Nelly will both respond to variations like night night, Nelly and be less sensitive to sleepy mumbling.

The actual sleep mode itself is an independent profile outside of the task that Nelly runs in. Its context – the way the entire profile triggers – is simply Variable Value: %Sleepmode matches on. This makes the profile active when I’ve told Nelly to make it active, and not active when I’ve told Nelly to deactivate it.

The profile has both an enter and an exit task, meaning a task that is run when it’s turned on, and one that is run when it turns off. The enter task starts off by running a separate task called Screen Off. Screen Off isn’t a complicated task in terms of number of actions, instead the reason for having it as a separate task and use Perform Task is to quickly access it from other tasks as well. What it actually does is append the date to a file called sleepmode.txt, which is then synced automatically using Dropsync. On my computer, I use a program called RoboTask to monitor the Dropbox folder that the sleepmode.txt is synced to, and trigger its own task when it’s changed. The task it triggers is to run a UI-less .exe file called nircmd, and do so with the screen off parameter. In practice, my phone creates a file that my PC reads and uses to turn my two monitors off. I tell Nelly to activate sleep mode, and about 15 seconds later my PC monitors go black.

The next two actions maximize alarm volume and sets screen brightness to 20, about 10% of max. It then sets the variable %Lastsleep to SM is on, and runs a plugin that parses the value of that variable to Make Your Clock Widget. In practice, when I activate sleep mode, my homescreen clock widget displays SM is on. Next it sets %Smactivation to %TIMES, where %TIMES is the current date and time in seconds – the only way to really make date and time compatible with normal math, as you can’t simply ask a calculator to subtract June 20th 2011 1:43 AM from July 30th 2012 3:10 PM. You can however make one subtract a large number of seconds that corresponds to the first date from an even larger number of seconds that correspons to the second date, and then convert it back. Think of it as a real life example of stardates from Star Trek…or something like that.

Next up is the actual response. These tasks run so quickly that the delay is unnoticeable, with the exception of the Say action, which has to wait for the TTS engine to actually say the text before continuing. That’s why the response is so far down the list of actions, allowing the process of turning off the monitor to run while Nelly is talking. The actual response is simple: Sleep mode activated. Good night.

The next action writes Andreas has been sleeping since %TIME to status.txt, a file that is synced to the web using Dropsync and available to friends and family who can then quickly check if I’m asleep, home, or away (other profiles change this file too), and for how long I’ve been sleeping.

Finally, there’s a 15 second wait, and then it triggers a complete Dropsync sync session, rather than the partial ones triggered by monitoring file changes. This is both a way to make sure I end the day with a full sync, and a way to make sure that the status.txt and sleepmode.txt files are really synced, even if I turn off my screen and mess up the file monitor syncs. Waiting for this complete sync to finish is why the sleep mode system seems slow in the video above, as sleep mode isn’t designed to be turned off right after being turned on.

The exit task is basically reversed. It writes Andreas is home to the status.txt file, writes to a wakeup.txt file that wakes my computer, and updates the widget to display LS %TIME, meaning the time it was deactivated and in turn the time I woke up. It also creates a variable called %Smduration that subtracts the %Smactivation variable written in the enter task from the current time in seconds, which means the result is how long sleep mode was active, in seconds. It also divides this by 3600 to get hours instead of seconds.

If %Smduration is a number greater than 9 (hours), it writes You lazy bastard to %Lazy. If %Smduration is lower than 9, it will simply retain its original value, which is a space. There’s also a Set Variable action at the very end of the exit task that sets %Lazy to a space so that the default for next time is a space even if %Smduration was higher than 9 this time.

The Say duration utilizes data from two of the variables that were just created. The Say text is Good morning. You slept for %Smduration hours. %Lazy. %Smduration is the time sleep mode was active, and %Lazy is either nothing or You lazy bastard depending on the value of %Smduration. %Lazy needs to default to a space rather than nothing, otherwise the TTS engine will actually read the word %Lazy (as in “percentage lazy”) if %Smduration is lower than 9. In practice, this results in two types of responses based on whether or not I slept for more than 9 hours. Examples are Good morning. You slept for 5.443 hours and Good morning. You slept for 9.724 hours. You lazy bastard. Just a little bit of an automated pep talk if I sleep for too long.

The answer to life, the universe, and everything

There are quite a few questions that people always end up asking voice assistants, and what is the answer to life, the universe, and everything is a classic. It refers to the brilliant book series The Hitch Hiker’s Guide to the Galaxy. The answer according to the book is 42, which is what you’ll get from most voice assistants. That means that mine can’t give such a “boring” reply though, so I set it to instead reply Ask the dolphins. Read the books to find out why. As for the trigger, *life*universe*everything* handles that nicely.

Am I in danger?

Another novelty response like the one above, but much, much less common. Since my Nelly is from the Kris Longknife series, I wanted something that actually hints to that, even if I’m probably the only one to ever use my Nelly that would understand the reference. Anyways, the trigger is am I in danger, and the response is Not unless there’s a Longknife nearby. Again, read the books to understand why ;)

Universal fixing substance

Another internal joke for readers of science fiction books. If asked to give the “universal fixing substance”, Nelly will reply “That’s the wrong book series, you moron. There aren’t any spiderwolves here”. References this book series.

Ask the boffins

Another actually useful feature, but with a twist from the book series Nelly is from. Boffin is a British slang term for scientist, and is used in the books to describe the scientists on board. The trigger here is *have a question*/*boston*, which means it will trigger on any mention of have a question or Boston. Why Boston? That’s what the speech to text system thinks I say when I say boffins. When presented with the pronunciation of the word from the Oxford dictionary iPad app I have, it instead heard it as office. Either way, I couldn’t get it to recognize boffin, so I adapted. For the record, if you have trouble making triggers work, you can create a new task with two actions: Get Voice, and Alert – Flash with %VOICE as text. This makes a message pop up on screen for a few seconds with the text that the voice recognition system thought you said after you read it in, and you can trigger the task manually when you need to test something. In my case, I wanted it to trigger on I have a question for the boffins, so if it consistently thinks I ask for Boston, the result is the same.

As for what this triggers, well, it’s another case of Perform Task where the actions are in a separate task. First Nelly replies Let me know what it is and I’ll pass it along, and then you have another Get Voice. I then use Variable set %Wolfram to %VOICE to get a custom variable that I can work with in ways that Tasker won’t allow you to work with built in ones. The %Wolfram variable is then split using Variable Split, and then Variable Join puts it back together with + as the joiner. This is to convert it to a format that Wolfram Alpha can read, specifically %Wolfram now looks something like word1+word2+word3 instead of word1 word2 word3,

Nelly then says The surviving boffins asked me to give you this, a reminder of the events of the latest book. The final action is Browse Url, with http://www.wolframalpha.com/input/?i=%Wolfram as the URL. That brings up a search result page on Wolfram Alpha, and since WA is a mathematical search engine that can do calculations this way, it truly does ask the scientists. If you don’t go via the Split and Join system, any multi-word search phrase will only have the first word actually make it to WA’s search box.

Norwegian Yellow Pages

If I need to find any business or similar things nearby, nothing based on English is going to help me. I need to search a Norwegian website with Norwegian text, so an English voice recognition system will be less than useless. As such, my yellow pages search trigger and action is much simpler than you’d think: trigger on *yellow pages* and browse the URL. Then I enter whatever I need manually, because that’s frankly the only way to do it with the language difference.

Music

Another simple one. A separate Task has Load App: Poweramp and Music Control: Toggle Pause as actions. A Perform Task with *music* as trigger in the Nelly task then starts Poweramp and starts playback when the word music is heard.

Notification note

The final feature in this first version of Nelly lets you create a note in the notification menu of your device by using your voice. It’s a separate task that is triggered in the Nelly task with *a note*, e.g. Add a note or Nelly, I need you to make a note of something for me. The separate task starts out with Say: What do you want it to say. Then it uses Get Voice to get the content of the note, does a Say: Done for good measure, and then uses the Notify action under Alerts with %VOICE as name. It also has a note pad as the notification icon just for good measure. I already have a fairly complex todo system set up on my device, and if I find a way to control that with Tasker I will add that to Nelly too, but for now this is sort of a “quick note” system for something that I need to do ASAP and so having a notification to remind whenever I look at the device works well.

In conclusion

There are clear advantages to making your own voice assistant, as well as disadvantages. While you’re stuck having to do everything yourself, and in many ways are limited compared to what you can do with true app programming, you only need one feature that you’ll actually use to make it worthwhile to skip 10 that you probably won’t. For Nelly, sleep mode and notification note are the two features you’ll unlikely to find in other voice assistants, aside from all the comic relief responses. Even so, being able to activate those using voice isn’t really 100% necessary, as it could be just as easily done with buttons, and in case of the notification note, a text input field. A voice assistant really becomes the most useful when it reads incoming messages, lets you dictate outgoing ones, and do it all in one go. I could easily program Nelly to do that if I wanted to, as it’s just a matter of transferring %VOICE to variables which go into Send SMS actions and so on. Problem is I don’t use SMS very often, and most definitely not in English.

Point being that you shouldn’t look at this article as something to be duplicated, but rather as an idea to be adapted for other uses, and more importantly: custom uses. I can tell my phone “good night” and it turns off my computer monitors for me. Try doing that with Siri, which won’t even let you look for a replacement phone without overriding web searches with propaganda. To send you off, here’s an undocumented Nelly easter egg for Mass Effect fans:

Share this:

About the Author

Andreas Ødegård was an associate editor at Pocketables. He’s more interested in aftermarket (and user created) software and hardware than chasing the latest gadgets and tends to stick with his choice of device for a long time as a result of that. Currently that includes an iPad mini and a Samsung Galaxy S II.

Brian B

So I’m trying to make my own version of this and it’s going pretty well but I’ve having trouble with multiple triggers. For my search trigger I want to be able to use “search, find, and Google” for triggers so I tried both of these are they didn’t work.

Tasker uses / as the OR separator, so use a slash instead of your commas and you’re golden :) great to see someone else make a voice assistant!

Brian B

So I’ve gotten most of my assistance down but I want more. I’m trying to have it give me random responses so I don’t hear the same thing every time. So for searching it might say “I’ll ask Google” one time and another time say “I bet Google knows the answer to that”.

My initial idea is to write each one of these to a separate file and name them ‘response1.txt’ and ‘response2.txt’ then my task would be task: Say: response%RANDOM.txt

I haven’t tried this out yet but I’m wondering if it should work in theory. What about putting all the responses in one file, how would I randomize which is said?

Andreas Ødegård

You have the right idea, but there’s actually a built in system to do it a bit simpler. There’s a Variable Randomize option under Variables that will allow you to create random variables. You can then put lots of responses in a text file and read that into a variable which you split into many variables %Response1, 2, 3 etc using Variable Split.

The Tasker wiki has an example profile that uses the same system for something else, see http://tasker.wikidot.com/fileproc
Scroll down to “responder task” and you’ll see how it uses Variable Randomize, Read Paragraph and Variable Split to achieve this :)

Rob Hayes

Please share these tasks! I am not talented enough to do it myself.

Andreas Ødegård

There are a lot of features unique to my setup in Nelly at this point, like the ability to interact with other Tasker creations I have and references to books that not many people have read. The basic premise of Nelly is very simple, using Get Voice and then IF matching to get what you need. I think you should try to build it from that yourself, otherwise you’re going to end up with my personal voice assistant on your phone. Alternatively, you can try a voice assistant called “utter!”, which is made by a guy who’s been heavy into Tasker for ages. That app also shares a lot of similarities with the way Nelly works, sort of like a commercial alternative to doing it completely from scratch.

Rob Hayes

Ok thanks, I’ll try to find the time to study up!

Regor99

I have been creating exactly the same project over the last few weeks before seeing this posting. Thank you so much for the tip using / rather than the comma. It has speeded up the loop process immensely. I use two text files, one for statements and one for responses and after the GET VOICE command I match the two lines in the two files by an id number within Tasker to create the AI responses. I also use the Random variable to produce different responses.
I am hoping to get the Nexus 7 tablet this month. The online voice engine will help the speed. I am also hoping Pent, the creator of Tasker will eventually create a $variable for the Task Action “Variable Search Replace” which again help response time.
We all have big hopes for Brandall and Utter. I think we are not far from a major update to wow all of us!

Andreas Ødegård

That’s awesome! I have to say I’m not using my Nelly that much on a daily basis, having shifted more to a scene with buttons for doing various things, but she still gets to say hi every now and then

brandall

Nice article, Tasker is where utter! was born! Tasker does have the ability to process the voice data to a higher level with the use of arrays and loops, but it’s pretty tricky stuff. When I get some time I will be making a tutorial on how to do this!

Huehohuu

Having some problems with the Perform Task task not behaving properly when I place it within a loop. I’ve tried variations of For, Goto, and Stop to create different types of loops, but still no dice.

I’m guessing it has something to do with parent and child tasks within a loop? Not exactly sure. Any help is appreciated.

Andreas Ødegård

You can try the following:
-set the priority of the performed task higher than the main task
-add a Wait equal to how long it takes the performtask to run, after the performtask

The latest feature I added to my home-grown virtual assistant is a task that reads off a list of functions that the VA is able to perform. That way when I’m showing it off to people the VA (I named her Holly) can actually teach them how to use it. And as an added bonus the task ends with an spoken inquiry as to any new features/suggestions that should be incorporated into the next “version” of the VA. So that %VOICE input writes to a new file which the next time the task runs will be accessed to tell the user which features are in the pipelines.

Andreas Ødegård

That’s a great idea! I never hand mine away, but I do have a “tell us about yourself” trigger that spits off a bit of info

Sirion

Hi,

after reading your articles I tired to make a voice assistant myself, it works great so far but a the problem is the default google voice search. It works only with internet connection and I need a way without internet connection.
Do you have an idea?

Andreas Ødegård

I don’t even think it’s possible to switch it out at all and still have it work with Tasker, so I belive you’re out of luck :(

Sirion

Guess i have to give utter! a try.

Andreas Ødegård

I didn’t know that Utter was offline capable, but if it is, that’s a very good alternative! You can run Tasker tasks with it

Luticus

Building a DIY personal assistant with Tasker as well. Mine is extremely powerful and can automate thing based on voice input, the position of the phone, my geographic location, what wifi/bluetooth signals it’s near, etc. My question: Any ideas on how to generate a good current list of apps installed on the phone so i can parse them with regex? the problem is that if i say “launch dolphin” it doesn’t know to look for “Dolphin Browser HD”. If it were just one app then I could specifically set it up to know that but I want tasker to be able to launch any app on my phone even if I just downloaded it from the app store. To do this i need a list of current apps on the phone so tasker can parse for the nearest match. Also I’m interested in facial recognition with tasker. Any ideas or tips would be very useful! Thanks!

Sam

This is a really cool article!!

I am so impressed with this idea that I’m gonna stop my work and start working on this right away :D

Andreas – First off, thanks a lot for your articles (especially the Tasker ones) – they’re very informative and I’ve enjoyed reading them. Also, after reading about “Nelly,” you’ve inspired me to try my hand at building my own voice assistant in Tasker. I’m in the preliminary stages right now, and I had a question for you – when evaluating the initial voice command, I can’t decide whether to have a huge set of if / else if / else statements, or to build an array with all the possible commands I plan to handle and loop through them with a “for” loop. Do you know if there is much of a difference between these two methods, as far as speed / efficiency goes? Thanks in advance for any insight you or others might be able to provide!

Andreas Ødegård

I’ve only ever used if statements for this particular thing, so I don’t know. That works fine, and it’s fairly fast, so I never saw a reason to experiment with anything else

Chris

Alright, thank you. I also posted to the Tasker Google Group, just to see what others (or possibly even Pent) think. Here’s the topic if you or anybody else that sees this is interested in the outcome:

Also, do you know if it’s possible to automatically receive an email when somebody responds to your comment on Pocketables? I only knew that you responded because I happened to come back and check this page.

Andreas Ødegård

Unfortunately I don’t think it is, I seem to recall it’s been brought up before :/ downside of this commenting system

Chris

Ok. FYI – It does look like this functionality may be available through a WP plugin:

Dgak, you could try this: make a separate profile that triggers the assistant by the proximity sensor.

If you really want it completely automated, you could make yet another profile, that triggers when the display is on (state-variable value, not the event), and the phone orientation is face up.
That way, when the phone is placed on a flat surface, and the display is on, you can just wave your hand over the phone and say your command!

Janus

Hey Andreas, thanks for your description. I’m making a voice control for when i’m biking, but can’t really get the triggers to work properly.
e.g. if I say only “radio” it works, but if I say: “play radio” or “turn on the radio” it does not recognize the trigger “radio”. Do you split all of your %VOICE variables??

Andreas Ødegård

You need to use wildcards, *. *radio* will match the word “radio” anywhere in the sentence

Lisa

I have a question that isn’t directly related to Nelly… Is there any way you can share the process of your “fairly complex todo system”? I am completely frustrated with the todo apps on the market (as well as the voice assistant apps on the market which is why I am here int he first place!). Since you have given me a great starting point for working on my own assistant app, I was hoping I could get some inspiration on the todo front as well! :)

Andreas Ødegård

I talk about it from time to time in my Tasker articles, and have a couple of tutorial for some (now) older versions of the think in the beginner’s guide to tasker (follow the banner in the right sidebar). However the currently in-use version has grown a lot since this was written, and has a ton of new things, including smartwatch integration. Unfortunately that complexity means there’s both too intertwined with other systems and too complicated to understand if you haven’t built it from scratch and understand how it works

regor99

HI

I have used your basic premise to build my own Nelly however I want to speed up her response to input.