Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Making Cents of Yens and Euros: Web 2.0 Internationalization Achim Ruopp achim@digitalsilkroad.net http://www.digitalsilkroad.net/ © Copyright 2007 Achim Ruopp Web 2.0 Expo 2007 Demo A Currency Converter Application – before and after Web 2.0 Internationalization Agenda  Introduction to Web Internationalization (i18n) • • • •  Selecting and Persisting User Preferences Locales and Locale Identifiers Unicode Localization – Model and Tools Multi-lingual Syndication • RSS • Atom  Client-side Scripting • Javascript Internationalization • Ajax  International Web Services Design • REST • SOAP Intro to Web Internationalization Language and Location fr en;0.8 en-US da-DK Intro to Web Internationalization User Preferences  Language • HTTP Accept-Language header • E.g.: en, fr-CA;0.8, fr;0.6 • Language negotiation with the server  Locale • Cultural preferences for formatting, sorting etc. • Infer from Accept-Language header • Map IPv4 address to ccTLD (country code top-level domain)  Public information accessible through libraries • E.g. Perl IP::Country CPAN module    Commercial services offer more precision Always provide option to change defaults Store preferences in cookies Intro to Web Internationalization Internet Language Tags   IETF Language Tags (BCP 47) Language[-Language]*3 [-Script][-Region] [-Variant]*[-Extension]*[-PrivateUse]* Examples • en-CA: English in Canada • Zh-Hant-TW: Chinese written in traditional Chinese script used in Taiwan  Obsoletes RFC 3066 & RFC 1766 • Often still used in products/earlier standards Internationalization Changes Intro to Web Internationalization POSIX Locales  Cross-platform API • Locale-identifiers can have variations   Un*x: en_US Windows: English_United States • Results can be platform-dependent   Basis for locale functionality in all scripting languages Provides functionality for • • • • • Number Formatting: 1,000,000.23 Date/Time Formatting: 8 Μάρτιος 2007 12:00:00 μμ Sorting String processing (e.g. upper-/lower-casing) Some translated strings like weekdays, yes/no messages Intro to Web Internationalization International Components for Unicode   IBM Open Source project Extensive locale data and APIs • Data vetted as part of Common Locale Data Repository (CLDR) project   Java and C++ APIs Wrappers for scripting languages • PyICU (Python) • ICU4R (Ruby) – abandoned? • DIY – difficult because of API complexity and character encoding issues Intro to Web Internationalization Microsoft Internationalization APIs    Windows NLS API Microsoft .NET Framework System.Globalization namespace Similar set of data to ICU • Data vetted by Microsoft subsidiaries  APIs accessible from all Microsoft programming languages Intro to Web Internationalization Unicode 5.0 99,024 of 1,114,112 code points (U+0000 to U+10FFFF) defined 00000 10000 20000 Basic Multilingual Plane Dead Languages & Math Han Characters 30000 Alphabets 2000 Punctuation 3000 Asian Languages 5000 Language Tags F0000 100000 1000 4000 … E0000 0000 Private Use 6000 7000 8000 Han Characters 9000 A000 B000 C000 D000 E000 F000 Yi Hangul Surrogates Private Use Legacy/Compatibility Intro to Web Internationalization Unicode Encodings Forms    Variable length: UTF-8/UTF-16 Fixed length: UTF-32 U+2122: ™: Trade Mark Sign UTF-8 0xE2 0x84 0xA2 UTF-16 0x2122 UTF-32 0x00002122 11100010 10000100 10100010 00100001 00100010 0…00100001 00100010 Intro to Web Internationalization Unicode on the Web   XML processors are required to process UTF8/UTF-16 Encoding declaration precedence 1. HTTP Content-Type header charset declaration 2. XML encoding declaration (XHTML) 3. meta charset declaration in (X)HTML 4. link element charset attribute   Approx. 4% of pages have encoding errors* No real need for character references • ü: ü or ü • Exceptions: <,>,&,"  Use styles to control font selection * source: Google presentation at IUC30 Demo A Currency Converter Application – globalized but not localized Intro to Web Internationalization Localization Recommendations Avoid translatable text in graphics Make sure graphics are culturally neutral Avoid absolute sizing Use HTML flow layout Write complete sentences Intro to Web Internationalization Localization Model and Tools  Text translation • Localization formats  HTML with template library • W3C Internationalization Tag Set (tool support?)   GNU gettext/PO XLIFF - XML Localization Interchange File Format • Localization tools      OmegaT Open Language Tools (Sun) The WordForge Project: Pootle … Searchability – Links/Sitemap Demo A Currency Converter Application – fully internationalized Web 1.0 application Client-side Scripting Javascript Internationalization  ECMAScript edition 3 added a range of internationalization features (1999) • Good support for Unicode processing • Set of locale-sensitive functions  Dependent on host locale (i.e. browser) • Set of locale-insensitive functions • No number or date/time parsing  Javascript libraries with additional internationalization functionality • dojo Toolkit (i18n contributed by IBM) • Microsoft AJAX Library Client-side Scripting AJAX Recommendations  Late globalization • Transmit data in locale-independent form with XMLHttpRequest • Might require some creative parsing/UI  Early localization • Text localization server-side • Browsers are missing a message-catalog facility • Dynamically created page content is invisible to search engines Multi-lingual Syndication RSS 2.0  Character encoding • RSS 2.0 is an XML application • XML encoding rules apply  Language • Element only on channel (feed), not on item  Create one channel per language • Specified to comply to RFC1766 language tags  Date/Time • In standard RFC 822 format (including 4-digit years)  E.g. “Wed, 02 Oct 2002 08:00:00 EST” Multi-lingual Syndication Atom Syndication  More granular language marking • xml:lang can be applied to any human readable text in the format • Aggregators need to deal with this  Better date/time format: RFC 3339 • E.g. “2003-12-13T18:30:02-05:00”  Acknowledgement: Tim Bray Demo A Currency Converter Application – adding a syndication feed with exchange rate information International Web Services Design Service Patterns Description Locale Neutral Neutral data formats Client Influenced Service reacts to client-locale e.g. HTTP AcceptLanguage Service Determined Service is locale-specific and ignores client preference Data Driven Service adjusts formatting and language to locale the data refers to Request data CAD Return data 1.1785 CAD (AcceptLanguage: de) Kanadischer Dollar 03/08/2007 12:00pm EST NOK norske kroner CHF ? International Web Services Design REST  REST naturally ties into i18n features in HTTP/HTML/XML • Locale indicated with HTTP Accept-Language • Encoding and language marking in markup  Special caution for HTTP GET parameters • Locale-independent formatting recommended • Text parameters   Encode in UTF-8 and escape in URIs IRI (International Resource Identifier) functionality might provide this for you International Web Services Design SOAP  Locale can be communicated in • Transport header (e.g. HTTP) • SOAP header • SOAP message body  Beware of automatically generated SOAP interfaces • Might be locale-dependent, but not allow to specify locale   Use of XML Schema data types promotes locale-independence Also consider localization of error messages Conclusions  Unification • One code base  Customization • Localization and adaptation for locales  Next step: cross-language “leakage” • Provide views in multiple languages to the same (user-generated) data • Translate user-generated content   Volunteers Machine Translation Call for Contributions  Presentation and Perl CGI demo code • http://www.digitalsilkroad.net/web2expo  Add a version in your preferred language • • • •  Ruby on Rails PHP Python … Similar ASP.NET application • http://quickstarts.asp.net/QuickStartv20/aspn et/doc/localization/default.aspx References  W3C Internationalization Activity • http://www.w3.org/International/  POSIX Locale • http://www.opengroup.org/onlinepubs/009695399/base defs/xbd_chap07.html  International Components for Unicode • http://www-306.ibm.com/software/globalization/icu/  Unicode/Common Locale Data Repository • http://www.unicode.org/  Microsoft Internationalization APIs • http://msdn2.microsoft.com/enus/library/ms776254.aspx • http://msdn2.microsoft.com/enus/library/system.globalization.aspx References  OmegaT • http://www.omegat.org/omegat/omegat_en/omegat.html  Open Language Tools • https://open-language-tools.dev.java.net/  The WordForge Project • http://www.wordforge.org/drupal/  Javascript Internationalization • http://www.icuproject.org/docs/papers/internationalization_support_for_javascript.ht ml  RSS 2.0 • http://www.rssboard.org/rss-specification  Atom Syndication • http://www.atomenabled.org/developers/syndication  RSS 1.0 • http://web.resource.org/rss/1.0/spec  W3C Web Services Internationalization Usage Scenarios • http://www.w3.org/TR/ws-i18n-scenarios/ Additional Slides Multi-lingual Syndication RSS 1.0  Character encoding • RSS 1.0 is an XML application • XML encoding rules apply  Complies to RDF (Resource Description Framework) specification • Definition of language and date/time formats are left to RDF metadata formats    Dublin Core Metadata Element Set Language: RFC1766/ISO639-2 Date/Time: ISO 8601 (superset of RFC 3339) • Also Dublin Core allows to specify time periods!