Did they actually use this before they sent it out?

About three weeks ago I picked one of the new Motorola Q cell phone(/PDA/Camera/MP3 Player).

So far, so good. The synchronization with Outlook, Email on the go and the browser are all great but one spot they really missed the boat is the Voice Dialling software. As I mentioned in another post, I’ve been reading “The Inmates are Running the Asylum“. Inmates focuses on the concept of “Interaction Design” and in particular the practice of “Goal-oriented Design”. The basic principle is figure out who your users are and what they want, find the user who is closest the common denominator (and most critical to your product’s success) and begins designing for that user. It’s the philosophy that you can make everyone happy none of the time, some people happy some of the time, and one person happy all of the time.

In the case of this phone’s voice recognition it’s clear they started at roughly the other end of the spectrum. From what I can see the developers were told “It needs voice recognition” so they went out and built something with all the bells & whistles that interested them.

Here’s the problem – it’s useless (I’ve used several other less blog-friendly terms for it in recent days trying to make it work).

On my previous cell phone voice recognition was the one thing that actually worked. On it the process was simple – as you added a number to the phone book you had the option of also enabling it as a voice dial option. On selecting the option it prompted you to say whatever you wanted to use as the voice command for that number (i.e. “The Office”). You repeated it, and if the system was confident it could recognize it, you were done (if there was background noise or you said it two different ways you may have had to repeat it a third time).

To use the voice dial feature you pushed a button on the side of the phone, and after the beep you simply spoke the command for the person you wanted to dial. And well over 90% of the time it got it right, first time.

On the flip side you’ve got the Q.

Because the Q has a mobile version of your Outlook right on the phone, it also has access to all your contacts. When you use voice dial it looks at all of your contacts and tries to find matches based on the names. It’s true, untrained, voice recognition which is really cool. Or rather would be – if it worked.

Challenge #1: Untrained Recognition
I’ve got a lot of contacts which means a lot of names to choose from. Compounding the problem is I’ve got a healthy sized extended family – so there’s a lot of Coleman’s in the mix. The challenge with an untrained voice recognition system is it has to have certain tolerances built into it to account for accents and different tones of voices. A few years back I interviewed at a company for a usability consulting role at a company doing voice recognition systems for IVRs. I didn’t get the job but the guy showed me some of the in and outs of Voice recognition.

Essentially their goal, with an untrained system, is to reduce the recognition to as simple a sound as possible and to create options with high contrasts between the sounds. This makes it much easier for a system to differentiate between the options . For example “Yes” and “No” – they would essentially teach the system the recognize either the “hiss” or “oooh” difference between the two options – you can actually just make a “ssss” sound into the phone and many systems will accept it as yes.

Now take a contact book with several hundred contacts, and a lot of contacts with similar names. You can guess just how accurate the system is. It’s actually almost comical the options I get when I ask it to dial my wife or home. This also means I get asked to confirm what number it’s dialling EVERY TIME.

Challenge #2: Options!
The next challenge is even worse. First, back to my old phone. Voice commands were tied to a specific phone number and I could assign them. So for example if I wanted to call my wife I just said “Erin Work” or “Erin Cell”.

On the Q it’s associated with the contact name. So now I have to say “Call Erin Coleman” but of course, which number? Here’s how the prompts go:

Q: “Say a command!” (in a really bitchy commanding tone – I already hate that voice)

Me: Call Erin Coleman

Q: Did you say call Ryan Coleman?

Me: No

Q: Did you say Kevin Coleman?

Me: NO

Q: Did you say Erin Coleman

Me: Yes. (Thank god!)

Q: Which Number? (at which point it displays a list of options on the screen – you know the one next to my ear because I’m on the phone. It also always displays all of the options and asks, even if the person only has one number assigned to it.)

Me: (Ugh.) Mobile

Q: Mobile. Connecting.

Needless to say, by the time I get it to connect I’m just hoping the phone landed on something soft as I threw it out the window of my moving car.

To their credit it does appear there’s a little “easter egg” shortcut. If you feel like complicating things even more with the voice recognition, you can be brave and actually pile the commands together “Call Erin Coleman Mobile”. I’ve had almost no luck getting this to work though.

Challenge #3: Hey there’s a kitchen sink in here!
Back to the issue of saying to developers “It needs voice recognition”. You’re not just limited to voice dialling. Oh no! You can also start applications, create a new text message or lookup a contact. These are all great except for one overlooked fact.

With the exception of the “Voice Notes” application none of the other apps respond to voice commands. So basically they’ve spent all kinds of time creating a feature that allows you to open or start things with your voice where you then have no choice but to look at the phone to continue. Why bother???? Especially when the phone doesn’t seem to recognize any of the application names.

In the end what they’ve managed to create is a sub-par, barely useable piece of software that does everything, but nothing well. Bigger isn’t always better guys.

Technorati: , , , , The Inmates are Running the Asylum,