Google – What are they up to with Machine Translation?

Since we started Clay Tablet an on and off debate we’ve had around the office has revolved around “What is Google’s end game in the translation space?” Since partnering with Language Weaver a couple of months ago, we’ve of course been paying a little extra attention to what they’re up to.

The big debate revolves around whether or not Google is truly a “competitor”. I think there’s no question right now that in just about any web-centric information management space these days Google is one of those inescapable gorilla’s that’s hovering nearby, and everyone secretly hopes and preys that the gorilla doesn’t decide to shift his weight in their direction.

The reality is Google’s “Language Tools” page sits on top of a very hot Statistical Machine Translation System (SMTS) – it has scored very high on any BLEU score and, along with Language Weaver, tests high above many traditional machine translation systems. The question is, “What’s their motivation?”. To date there’s no (public) API for it, no “appliance” that you can buy and bring in house, no obvious revenue model (although with Google that isn’t surprising), etc.

Friend, Foe or Indifferent?
So why even have it? And if they aren’t selling it are they really a competitor? In my head the answer is mixed. Google is certainly competitive as an “Alternative” – but I’m not entirely convinced they’re a direct competitor. For translation in general (services and technology) tools like Google’s or Babelfish come up all the time as prospects ask “Why wouldn’t I just use xxxx?” (“FREE” is certainly attractive).

I think it’s important to flip the question around and ask “Does Google consider you a competitor?” – not even in the context of a specific business but if Google looked at your technology (SMTS for example) would it be something they acknowledge as a competitive product?

In the case of SMTS my hunch is actually “No”. I suspect if you asked Google if they played in the translation industry they’d say “No”. When you strip back all the layers Google is, at the core, a company fixated on aggregating, storing & organizing every piece of content it can get it’s hands on then turning around and making it so users can easily locate, manipulate and access that content. The challenge is the content isn’t only in one language and more importantly language is a huge barrier for the exchange of information.

Language as a Barrier to Accessing Information
Ask 10 North Americans what the primary language of the web is and I’ll bet they’ll overwhelmingly respond “English”. The reality is for every one English speaker online there are three who don’t speak English. And if you step back to think (outside the context of the translation industry) how often do you actually encounter content online in a language other than English?

The way the web works today language is actually the top-level organizer. Before you can organize or sort content into themes or topics it gets sorted into silos by language. Why? Largely it’s because of how we access content. Content gets accessed in several ways the main methods being direct URL, hyperlink or search. Typically a direct URL is something you’ve received from a referral or piece of marketing material – almost always in the language you speak. Hyperlinking will come from another page of content that you are viewing and in many cases those links will take you to another page or site in the same language.

And finally search, when you go to Google (or any other search engine) you put English (or your primary language) words into the query. So what does it return? Results in the same language.

With company or product names you’ll cross the language barrier occassionally but consider what would happen if I searched “Replace chain on my bicycle” (All English results), now what if I search “Remplacer la chaîne sur ma bicyclette“? All French. There’s probably some great tips or suggestions in one silo or the other but as an English speaker/searcher they essentially don’t exist. A search for “bicycle” would never return “bicyclette”. Or would it?

My hunch is this is why Google has their SMTS system. Imagine the day where you can enter “Replace Chain on my bicycle” into Google and it returns the results to you with content from multiple languages, automatically translating your keywords (bicycle = bicyclette) for the query and then translating the results.

Personally I don’t think Google had/has any intention of splashing around in the translation pool but they’re a big enough Gorilla that we’ll see the ripples come down from upstream for a long while. And those ripples can still be big enough to swamp a few boats, intentionally or not.

I’d be keen to hear your thoughts.

– Ryan

Technorati: , , ,

Time flies when you’re….

…rebuilding your system
I came into work a few weeks ago to find that my Hard Drive, which admittedly had been making all kinds of horrible sounds for a couple of weeks, had decided to pack it in over night. From my searches it appears that it died in the worst possible way. It now sits on my desk as a reminder to improve my backup process (mostly done, and still doing). Thankfully at the first sign of trouble with my HD I had backed up all the important stuff so it was really only a matter of a few days of work that went up in virtual smoke.

…on a never ending stream of conference calls
Momentum is a great thing but one of the challenges of taking a global solution to market is in most cases all your communication is done in emails & conference calls. There was a day a few weeks ago that was truly an experience as we were basically in calls all day following the time zones around the world as we talked with prospects & partners. By the end of the day we’d done calls in Qatar, Italy, Norway, Belgium, Portugal, Toronto, St. Louis & Los Angeles.

The Qatar call was interesting as our contact was actually walking through the streets on her cell phone and in the background we could clearly hear the call to prayer sounding out in the background – some very interesting background noise to the conversation and something that was neat to hear in “real-time”.

…featured in Backbone Magazine
Last week the latest issue of Backbone magazine came out with a one-page profile of Clay Tablet and the CEO of The Branham Group’s thoughts on what we’re doing. This interview was done as a follow up to our being named to the Branham Up & Comers list this year.

…attending the first “Mesh 2.0” Conference
I had the opportunity to attend the second day of Mesh 2.0 yesterday. Mesh is a new Conference centered on “Web 2.0” technologies and encouraging discussion and co-operation in pushing the concept and technologies forward. Overall I really enjoyed the show, the panels were great and had some really good moderators & speakers including Jason Fried of 37signals.com, among others. The concept of “Web 2.0” I still find an odd term since it seems everyone defines it a little differently.

In the end the only thing that struck me as odd was the obsession over having laptops on and in use throughout all of the sessions. For an event that was all about meeting with each other, to interact and discuss etc. it was strange to see half the crowd disengaged with their heads down and focused on their computer. A few people did seem to be live-blogging the event (a phenomenon I don’t really understand) but the vast majority were checking email & surfing the web. My only suggestion for next year? Ban the laptops from the session halls – people can update their blog & check email during the breaks.

…doing much much more
Overall there is just a tonne of stuff going on. I’ve got some good ideas for blog posts that I’m working on now – as I can share them I’ll post ’em. As you can probably tell the blog slid by the wayside during all of this. I’m trying hard to get back to the routine of at least one post a week.

– Ryan

Technorati: , , , ,