Google – What are they up to with Machine Translation?

Since we started Clay Tablet an on and off debate we’ve had around the office has revolved around “What is Google’s end game in the translation space?” Since partnering with Language Weaver a couple of months ago, we’ve of course been paying a little extra attention to what they’re up to.

The big debate revolves around whether or not Google is truly a “competitor”. I think there’s no question right now that in just about any web-centric information management space these days Google is one of those inescapable gorilla’s that’s hovering nearby, and everyone secretly hopes and preys that the gorilla doesn’t decide to shift his weight in their direction.

The reality is Google’s “Language Tools” page sits on top of a very hot Statistical Machine Translation System (SMTS) – it has scored very high on any BLEU score and, along with Language Weaver, tests high above many traditional machine translation systems. The question is, “What’s their motivation?”. To date there’s no (public) API for it, no “appliance” that you can buy and bring in house, no obvious revenue model (although with Google that isn’t surprising), etc.

Friend, Foe or Indifferent?
So why even have it? And if they aren’t selling it are they really a competitor? In my head the answer is mixed. Google is certainly competitive as an “Alternative” – but I’m not entirely convinced they’re a direct competitor. For translation in general (services and technology) tools like Google’s or Babelfish come up all the time as prospects ask “Why wouldn’t I just use xxxx?” (“FREE” is certainly attractive).

I think it’s important to flip the question around and ask “Does Google consider you a competitor?” – not even in the context of a specific business but if Google looked at your technology (SMTS for example) would it be something they acknowledge as a competitive product?

In the case of SMTS my hunch is actually “No”. I suspect if you asked Google if they played in the translation industry they’d say “No”. When you strip back all the layers Google is, at the core, a company fixated on aggregating, storing & organizing every piece of content it can get it’s hands on then turning around and making it so users can easily locate, manipulate and access that content. The challenge is the content isn’t only in one language and more importantly language is a huge barrier for the exchange of information.

Language as a Barrier to Accessing Information
Ask 10 North Americans what the primary language of the web is and I’ll bet they’ll overwhelmingly respond “English”. The reality is for every one English speaker online there are three who don’t speak English. And if you step back to think (outside the context of the translation industry) how often do you actually encounter content online in a language other than English?

The way the web works today language is actually the top-level organizer. Before you can organize or sort content into themes or topics it gets sorted into silos by language. Why? Largely it’s because of how we access content. Content gets accessed in several ways the main methods being direct URL, hyperlink or search. Typically a direct URL is something you’ve received from a referral or piece of marketing material – almost always in the language you speak. Hyperlinking will come from another page of content that you are viewing and in many cases those links will take you to another page or site in the same language.

And finally search, when you go to Google (or any other search engine) you put English (or your primary language) words into the query. So what does it return? Results in the same language.

With company or product names you’ll cross the language barrier occassionally but consider what would happen if I searched “Replace chain on my bicycle” (All English results), now what if I search “Remplacer la chaîne sur ma bicyclette“? All French. There’s probably some great tips or suggestions in one silo or the other but as an English speaker/searcher they essentially don’t exist. A search for “bicycle” would never return “bicyclette”. Or would it?

My hunch is this is why Google has their SMTS system. Imagine the day where you can enter “Replace Chain on my bicycle” into Google and it returns the results to you with content from multiple languages, automatically translating your keywords (bicycle = bicyclette) for the query and then translating the results.

Personally I don’t think Google had/has any intention of splashing around in the translation pool but they’re a big enough Gorilla that we’ll see the ripples come down from upstream for a long while. And those ripples can still be big enough to swamp a few boats, intentionally or not.

I’d be keen to hear your thoughts.

– Ryan

Technorati: , , ,