Nuance Communications CTO talks about new Dragon Go! iPhone app

The magic of mobile applications is that is that continual innovation in the arena can lead to delightful user experiences. This is the case with the new Dragon Go! app developed by voice-recognition pioneer Nuance Communications.

While there are an increasing number of voice-recognition apps available for iOS and Android devices, the genius of Dragon Go! is how it seamlessly integrates with other sites and applications.

Note Appolicious Advisor Kathryn Swartz in her review of Dragon Go!.

“If you ask the app for directions, it will launch them in your Maps app. If you tell it to call the Apple store, it will dial the location near you. Want to hear a band on Pandora? Ask for it, and if you have Pandora on your device, Dragon Go! will launch it for you. The app also offers integration with iPod, Last.fm and Yelp. Dragon Go! also features support for specific websites, so you can restrict your searches. The supported list is lengthy, and includes sites such as Amazon, CNN, Craigslist, ESPN and the Wall Street Journal.”

In this edition of Meet the Makers, we tap into the insight of Nuance Chief Technology Officer Vlad Sejnoha. In addition to revealing a lot of the development vision and methodology behing Dragon Go!, Sejnoha wears his marketing hat and talks through the four best ways for app developers to drive downloads.

Appolicious: We see literally hundreds of apps each week, but rarely are wowed like we were when we first tapped into Dragon Go! Nice work. From your vantage point, what is the most significant innovation of the app?

Vlad Sejnoha: First – thank you for the kind words. We are very passionate about improving the way people use their devices, apps and the Web, by simultaneously simplifying interactions and making them more powerful. And there really is nothing better than having our ideas validated by the kind of overwhelmingly positive reception that Dragon Go! has received since its launch.

Dragon Go! marks an evolution in the use of natural language understanding in mobile search by allowing users to speak, have their intention understood, and be taken directly to desired information, services, and content. Dragon Go! minimizes the number of intervening steps required to obtain useful results, and thus reduces the “distance” between the user’s intent, and the outcome they desire.

In setting out to do accomplish this goal, we realized that we would need to explicitly address a number of important distinct behaviors that are typically lumped together when we speak about “mobile search”.

First, when people search on their mobile device, they are frequently looking to access specific information from a source that is known and trusted, versus just exploratory searching for information. For example, users might be looking for information about a specific restaurant on their favorite local review site, or trying to find the stats of last night’s baseball game. And that’s what makes Dragon Go! so powerful: people can speak a query for content from a site they want. So you can ask for “debt ceiling on CNN.com,” and the app will retrieve exactly that without any unnecessary intermediate steps.

It’s also important to understand that mobile search typically involves sequential browsing – a user may want to check information related to a particular item like a business or a product – from several complementary sources. Visiting a number of sites of interest one after another can be slow and tedious, and so we thought, why not retrieve these results simultaneously? That led to the concept of the tabbed carousel in Dragon Go! that allows users to scroll from side to side to quickly and easily see additional results from complementary content providers.

If a query is not specific to a particular web site, we apply our natural language processing to automatically determine the meaning of a query, and make sure to display results from a number of relevant and popular content sites. So if you say “Buy tickets to Cowboys and Aliens near me” the app is able to discern that you want to buy movie tickets for Cowboys and Aliens at a local theater – so we take you to Fandango, but also provide results for additional insight on the movie from various sources like IMDb, Twitter and others. And of course, people have the option to rely on their favorite general search portal, Bing, Google, or Yahoo!, as we’re not looking to limit what information can be accessed from our app.

That’s actually an important point: Dragon Go! was conceived to enhance versus replace existing Web site and application interfaces. The app doesn’t force people to adopt a completely new interaction model to the exclusion of all else. We believe that doing so would be quixotic – there are thousands of talented designers working on optimizing their Web and app interfaces and functionality, and complementing those advancements only benefits the broader ecosystem.

Finally, we also wanted to provide quick and easy access to some of the most popular native applications on the iPhone to offer a better user experience, like playing music or getting directions. So for queries like “Call Whole Foods near me” or “Get directions to 1 Wayside Road in Burlington, Massachusetts”, Dragon Go! automatically engages the phone or Maps apps respectively.

What is also innovative about Dragon Go! is its simplicity. The advancements in language understanding and processing on the backend are quite sophisticated, but using the app is easy. We knew that Dragon Go! would have to happily handle a full range of query types, all the way from the simplest keywords (“weather”), to complex natural language queries (“reservation for two at six o’clock on Friday at your favorite restaurant”), without imposing a steep learning curve on the user.

APPO: How long was the app in development and how many people worked on the project?

VS: Dragon Go! has its roots in one of our earlier apps, Dragon Search, which was launched in December 2009 and was quite successful. However, with our advancements in language processing, we knew there was potential for much more. Contributions to Dragon Go! came from across the company, including researchers from our core speech recognition team, the natural language processing team (which also does work on projects such as the automatic extraction of medical facts from reports dictated by doctors in order to automatically create electronic health records), client application and server engineering teams, as well as user interface designers and usability testers. The design process was “agile” and iterative, with many rounds of user testing. The fact that Nuance has both considerable depth and breadth in speech technology made it possible to pull together such a varied and multidisciplinary team – something which would have been undoubtedly much harder in a smaller company.

APPO: Tell us the truth, how much attention do you pay to other voice recognition developers in the mobile space – specifically Siri, Bing and Google?

VS: Well, first and foremost, we are members of the speech technology community, and therefore know a lot of the developers and scientists working at these companies, and have a lot of respect for what they do. Competition in this market is very much alive and healthy. It’s also worth noting that Siri’s voice capabilities are powered by Nuance as part of the Nuance Mobile Developer Program, through which we make our Dragon Mobile SDK and text-to-speech capabilities available to all app developers. The program currently has over three thousand participants, and the Dragon Mobile SDK powers a number of popular apps, including Price Check by Amazon, Ask for iPhone, Merriam-Webster, and many others.

APPO: What was our methodology for identifying third-party sites and applications to connect to within the app, and should we expect more partners in updates over time?

VS: We approached select content providers in the most popular mobile search and app categories, including business listings, music, movies, restaurants, sports, news and social networks. We will continue to add to the list of third-party sites and apps to give users the best possible experience. In addition, you will see new functionality and more seamless interaction with our current content providers as we, and they, offer new user capabilities. Part of that process will be through feedback we get from the users themselves, as well as apps and content providers that share an interest in working with us.

Dragon Go! supports a range of content integration models, from the most light-weight (where the app simply routes a user’s query to a desired destination), to deeper integrations that allow users to convey multiple pieces of information in one utterance (like saying a restaurant name, reservation time, and number of people while making restaurant reservations). We therefore also looked for opportunities where we could bring greater user value by building more tightly integrated user experiences with providers’ mobile sites and applications.

Dragon Go! is inherently a brand- and publisher-friendly application, in that it presents content and apps without altering them in any way, and its primary role is making it easier for users to access and make use of that content. These core characteristics of Dragon Go! make it appealing to content providers.

APPO: Can you estimate the percentage of overall resources Nuance Communications allocates to mobile-specific initiatives?

VS: Nuance’s Mobile division is among the fastest growing in the Company, with speech and touch technologies that improve the user interface for handsets, automobiles and consumer devices. It’s also important to note that our mobile initiatives span across the entire organization – including mobile applications for the healthcare industry and mobile customer care offerings. So while we don’t cite specifics, Nuance is certainly heavily invested in mobile initiatives.

APPO: Explain how you go about creating different app experiences on different mobile platforms – specifically iOS and Android? Talk about the technical challenges and opportunities that exist creating apps for each environment.

VS: Our approach to building applications is to leverage common components of the native platform so users are already familiar with the tactile controls (e.g., widgets, keyboard, settings layout), but in doing so preserving the core speech interaction model across platforms.

Each platform has unique strengths to be leveraged, as well as limitations that result in tough design tradeoffs. For example, while on iOS there are limitations to how fully integrated our solution can be with other applications on the phone and what system resources we can gain access to (e.g., text messages), there are very powerful system components that can be instantiated inside of our own application to enrich the experience. iOS apps tend to put a high emphasis on design, and Apple’s frameworks make it easy to create a rich, immersive experiences without writing a ton of code. The minimal device fragmentation in the platform means our development team can focus on getting every pixel right without having to worry about many screen sizes and device capabilities.

On Android, one advantage is that we can create fully integrated speech solutions, because of the deep integration possibilities available on the platform. For instance, in our FlexT9 product, we are able add a custom keyboard to the system that integrates our whole suite of text input technologies – Dragon Dictation, T9 Trace, T9 Write and XT9.

APPO: Wear your marketing hat for a second and describe what you believe are the most important characteristics of an app like yours to drive downloads.

VS: It just needs to work. Legacy speech recognition systems have not always lived up to their promise of offering highly accurate and easy to use experiences, but all that is changing with applications like Dragon Go! where you can just push a push a button and say anything that comes to mind. This type of open-ended natural interface will not only transform how users go about interacting with their device, but it will also change users’ confidence in and overall comfort with using speech as a primary means of input into the device.

It needs to offer a clear purpose and value. Many applications available for download offer no meaningful value to consumers, or that value is often hidden under poorly designed interfaces. Our approach to Dragon Go! was to build an experience that would revolutionize how people gain access to information, as up until now we realized this has been a major pain point on mobile devices. To be successful applications need to offer a richer and more efficient means for going about your daily business and interacting with your device.

It needs to be simple and elegant. Robust speech recognition and natural language understanding are only half the battle. The application itself must engage consumers and be incredibly easy to use. Tremendous creative design and usability efforts have gone into making Dragon Go! a simple, yet elegant experience. The application needs to be intuitive and work like users expect it to, with virtually no learning curve. It also needs to work like they do, with similar ‘workflow” for the things they want to accomplish.

It needs to provide an integrated experience. Whether the user is getting something done within our app or we’re providing a transition to a partner’s site or application where the user completes their task, the experience needs to be entirely seamless and integrated with other complementary elements of their lifestyle. We need to make the hard stuff easy, and the easy stuff invisible.

APPO: What are the three biggest things going on right now in the mobile media space keeping you up at night.

VS: Ironically, what keeps me up at night are good things: specifically, what sometimes seems like an exponential growth in the number of new opportunities and mobile use cases for speech technology, and how to do them justice.

We are now clearly well past the era where speech was viewed narrowly as an occasional alternative to typing, and there’s now a broad understanding that speech understanding of the sort used in Dragon Go! – if designed in as a core element of a device’s user interface – can be transformative. Imagine being able – through natural language queries – to ‘reach’ out for information and functionality that are not presently visible on the device screen, in a way that complements and enhances the task at hand.

Consider if while reading a review of a restaurant or current news, you could say “what did my friends think about this?” and, using speech understanding in combination with your implicit social graph, the system could perform a directed topical search and pop up a summary window with the desired information, and which might allow you to start a direct conversation. Similar technology may provide users to additional “ambient information” via whatever device or channel is available – mobile, car, TV (all areas where we are actively introducing speech understanding, by the way). Language understanding should thus be viewed as providing a whole new dimension to user interactions with visual interfaces and I expect to see dramatic new UIs as designers take on this challenge.

In my view, the opportunity to simultaneously tap into multiple relevant information sources, understand their content, and extract and synthesize results which are useful to the user in a given context, will give rise to new solutions to application integration, interoperability, and semantic annotation, and over the coming few years bring us that much closer to the promise of the “semantic Web.”

Recent content