Wednesday, 19 December 2012
Facebook StumbleUpon Twitter Google+ Pin It

Putting Zürich before Århus

Author Photo
By Mark Davis, International Software Architect

Until now, it has been very difficult for web application designers to do something as simple as sort names correctly according to the user's language. And it matters: English readers wouldn’t expect Århus to sort below Zürich, but Danish speakers would.

Because linguistic sorting requires a sophisticated algorithm and lots of data, it was impractical to do this natively in JavaScript. Until now, the only full solution for sorting on the client side was to generate on a server a sortKey for every string that needed to be sorted, and send the sortkeys — base64-encoded — down to the client along with the strings. Pretty ugly! And what’s doubly frustrating is that the underlying operating systems have all been able to handle this, whether through International Components for Unicode (ICU) or Windows APIs.

The new internationalization specification for ECMAScript (the “official” name for JavaScript) changes this picture. It is already in the production version of Chrome, and is on track for other major browsers.

Linguistic sorting is not the only benefit—not only will users be able to see names sorted correctly, but also correct numeric values (“1,234.56” in English, but “1.234,56” in German), dates (“March 10, 2012” vs “10. März 2012”), and so on. While the results might not be precisely the same in every browser, they should be appropriate to the language, and are returned using a uniform API.

On any enabled browser — in its supported languages — web application developers can:
  • compare strings correctly: choosing whether or not to ignore accents, case differences, etc.
  • format numbers correctly: choosing decimal places, currencies, whether to use thousands-separator, etc.
  • format dates and times correctly: choosing decimal places, numeric vs named months, etc.
  • match locales: comparing the user’s desired locales (say Arabic and French) against the supported locales (say French, German, and English), to get the best match.
The API also allows for linguistic support in offline web applications, which wasn’t practical before. It builds on the industry standards BCP47 (for identifying languages and locales) and LDML (part of the Unicode Common Locale Data Repository (CLDR) project). For the gory details of the spec, see ECMA-402: ECMAScript Internationalization API Specification (just approved by the Ecma General Assembly).


Mark Davis is president and cofounder of the Unicode consortium, and founder of ICU and CLDR. Mark is fond of food, film, travel, and RPGs. Mark lived for 4 years in Switzerland, and is moving back in February.

Posted by Scott Knaster, Editor

No comments: