May 9, 2014

The Innards of Our New Internationalization Systems

Last week we posted about some internationalization changes. You may not have noticed a ton, but this was actually a pretty major code change that affected most pages on the site and touched four code bases. This week, we’re diving below the surface to take a look at the rest of the internationalization iceberg.

What We Had

Our previous system for deciding which language to show you was fairly simple — we looked at your subdomain. If you were on de.twitch.tv, we’d serve you German wherever we could. This solution worked okay for a few years, but it had a few significant downsides.

First, link sharing across languages was basically broken. Got a German friend who shares a link with you? Welcome to German Twitch. Oops.

Second, our old system was modeled in the same light as sites like Wikipedia, where the content of the page varies between languages. For instance, the Wikipedia page for San Francisco has different information on it at de.wikipedia.org than at en.wikipedia.org, even given a perfect translation. For this sort of thing, subdomains make sense; you’re actually at a different version of the website when you visit it in a different language. Our content, though, is video. If you visit www.twitch.tv/360chrism, it doesn’t really matter if you’re seeing German or English — you’re there for the video. So it doesn’t really make sense to use subdomains here because for us, language is more of a modifier.

What We Wanted

Apart from writing something that’s generally more efficient and more liberal about outsourcing work to existing libraries, we had a couple of things in mind for our new system.

Time To Choose

It turns out a big part of serving language is deciding on which one to serve! Having something that can fairly accurately know which language to give you all on its own is an important aspect here.

Caching

Our caching layer used a hash of the page URL as its key. This allowed pages like de.twitch.tv/snarfybobo and [www.twitch.tv/snarfybobo](http://www.twitch.tv/snarfybobo) to be cached separately without any extra work. But by removing language subdomains, we lose this free solution; both English and German localizations now reside at www.twitch.tv/snarfybobo, so keeping our caching behavior would have these two pages produce the same cache key while having different content! Oops!

What We Made

Accept-Language

The Accept-Language header is a chunk of information about which languages you prefer that your browser sends with every page request it makes.

{% raw %}
Accept-Language: en, de, sv
{% endraw %}

It’s essentially a priority-ordered list of allowable languages (more on order later), and it’s pretty nice to work with because:

  • it’s in the HTTP spec, so we can expect it to be there;

  • when used correctly, users don’t have to set their preferred language on a site-specific basis (with a couple of exceptions); and

  • it gives us a variable-size list of options, which isn’t something subdomains can do reasonably.

Accept-Language for You

There are a few problems preventing the Accept-Language header from being our complete package, though.

Not all sites have perfect (or even complete) translations for the languages they support. So if you’re a fluent Japanese speaker and a semi-fluent English speaker, and our Japanese translations are abhorrent, you might just prefer to see English.

It’s important to note that the Accept-Language header actually supports this — it’s constructed in a way that allows websites to make intelligent decisions about language that involve looking at a preference quotient of the user for any language.

{% raw %}
Accept-Language: en, de;q=0.8, sv;q=0.7
{% endraw %}

This is what an ordered Accept-Language header really looks like. The position of a language doesn’t actually matter; the quotient does. But the spec is just a little too old and complicated for browsers to be asking users for numeric quotients, so they’re typically semi-arbitrarily set by the browser to match the language preference order that the user specifies.

On the flip side, some users are fluent in multiple languages and enjoy using different languages for different sites. Maybe you’re a native German speaker but also speak English fluently; it’s possible that you frequent English-speaking streams and prefer Twitch in English. It’s also possible that you frequent German-speaking streams and prefer Twitch in German. Both are valid options, but you wouldn’t want to have to change your browser’s global behavior to allow this.

Cookies are a natural solution to both of these problems. So if you’ve got a language cookie, we’ll override any logic that we perform with your Accept-Language header and instead just give you what your cookie asks for. Easy, right?

Accept-Language for Our Cache

Earlier I mentioned that our cache keys are URL hashes. Without subdomains, we have identical URLs (and thus identical cache keys) for different languages. So we need to add something else to this hash to vary it on language.

We could add our final inference of your language preference, which would give us the same caching behavior that subdomains did. But language preference is something that we actually have to parse your Accept-Language header for, which takes precious time we don’t have on our caching layer.

However, we do have the time to pattern match cookies. And the logic to determine what language to serve you isn’t a big deal to perform if we can do it behind the scenes with an API call. Plus, if we save the result in a cookie, we’ll typically only have to do it once.

With this in mind, we made the following Varnish change:

{% raw %}
if (req.http.Cookie ~ "language") {
  hash_data(regsub(req.http.Cookie, "^.*?language=([^;]*);*.*$", "\1"));
}
{% endraw %}

When you visit us for the first time, you’ll get the best of both worlds: we’ll do all the work we need in order to detect what language we should serve you, then we’ll save the result in a cookie. Next page load, it’ll get mixed into the cache key with the URL of the page you’re on, and everything will continue as it did previously.

Parting Thoughts

Despite being smarter about language, our new internationalization system actually simplifies our code base quite a bit. In fact, we’re at a net loss in lines of code after this change. We can also ditch a good amount of our nginx config, and adding support for new languages has become easier as well.

We’re always looking to improve user experience for Twitch users around the world. Our internationalization changes are going to make it much easier to update, improve, and maintain localization on Twitch. If you have any questions or comments, feel free to speak up down below!

(P.S. If you’re curious about tackling other interesting engineering problems like internationalization, check out our jobs page!)

In other news
May 12, 2014

Site Maintenance Scheduled for May 15 at 12:00am PT

Site Maintenance Scheduled for May 15 at 12:00am PT Post
May 9, 2014

DreamHack and Twitch Announce BFF Status through 2014

DreamHack and Twitch Announce BFF Status through 2014 Post