Content Localization: Tips You Need To Know

29th November 2017
Content Localization: Tips You Need To Know

With online retail continuing to expand globally, many online retailers are looking for ways to overcome the barriers of content localization in order to expand their presence worldwide and to increase their revenue stream. This can either be done through aggregators such as Amazon, eBay, Lazada, Alibaba, Rakuten etc. Or, alternatively, retailers can directly or indirectly set up an online presence in the target markets of their choice.

Whatever the chosen mechanism, content localization (translation) becomes an immediate need and issue. However, unlike traditional localization projects, eCommerce content is prohibitive, and the time required too long. Therefore, machine translation is often the only viable option.

However, to successfully machine translate eCommerce type of content, several key challenges need to be taken into account. The key challenges can be summarized as follows:

This is not a ‘general’ localization initiative

In many cases, eCommerce content localization projects are approached the same way as a general localization project of a website or documents. However, for most eCommerce localization projects, this is not advisable, because they are actually data processing projects that happen to include localization by machine and therefore need to be treated as such. A different expertise and deployment process than for standard human localization projects are required.

Volume and Timelines

The most obvious difference between general localization projects done by human and eCommerce content localization is the sheer volume. In eCommerce, it is not unusual to have volumes of more than 100 million words that need to be processed within a short period of time. Projects usually start with a backfile or historical content upon initial deployment followed by the ongoing corrections and updates. Very often, content also needs to get localized in multiple target languages simultaneously. In addition, updates need to be processed in a timely manner, which makes efficient workflow design, priority-based processing and capacity planning a key requirement.

User-Generated Content

Usually, online retail platforms enable sellers to upload and sell through the provided platform, shifting the responsibility of content generation on the seller. While this approach definitely has its merits, especially for the platform providers who act as aggregators, it also leads to widely varying content quality and a lack of control which in turn affects the localization workflow.

Metadata Handling and Generation

Most online retail data comes with complex markups and nested structures that need to be handled correctly. Often this data includes additional metadata to support the search process. These tags need to be retained in place. Specifically, when translating between languages with very different structures, for example; Japanese and English, the long-distance reordering of words taking place during the translation means that the localization process must ensure that the tags travel along with the respective words.

Style, Measurements and Generation

Many Online retailers set a specific style in which they prefer the content to be formatted. For example; the preferred structure of a title may be ‘brand’, followed by ‘product’, followed by ‘product code and measurements’. Supporting these styles as well as enabling a good translation requires the system to recognize the brands and products (and the combination). For example; the system must know when to translate the word ‘apple’ and when not to. Also, measurements conversions, currency conversions, etc. are all part of the localization process and have to be supported by the platform.

Domains, Brands, Products, Codes

Understanding the domains (for examples electronics, household goods, garden…) as well as the respective brands, products and codes, as well as measurements, are key for any good quality translation whether by human or by machine. But if machine translation is the choice, then the machine needs to made knowledgeable of the respective brands, products etc. or else the translation will be poor.


Finally, as for any computer-based system, the golden rule applies here as well, which is: Garbage in, Garbage out (GIGO). ensuring the workflow offers the platform good quality data is key. While it sounds obvious, a lot of online retail content, specifically from platforms that allow user-generated content to be loaded, is of poor quality with mixed languages, encodings, etc. – all of which, unless repaired, has a very negative effect on the machine translation

While the challenges of machine translating this type of content are numerous, a structured and best-practice approach to addressing eCommerce content can assist in addressing the issue listed. These steps suggest a simple x-step process:

  1. Understand your content – The challenge that a large marketplace provider with a wide range of sellers has is different from the issues of a well-controlled-eCommerce site. Understanding domains (e.g watches vs electronics vs shoes) helps to categorize data assets and to isolate and address challenges per domain. Also, make sure you understand the structure and relevance of your data, e.g titles vs. what’s in the box.
  2. Understand your quality requirements – Some sites are merely interested in the buyer’s ability to make a purchase decision whereas other sites want perfect quality. Different requirements drive different approaches.
  3. Select the right platform– If you are not that concerned with any of the above, an API to one of the cloud providers will likely be sufficient. But if you do want control and quality as well as a possible integrated project management and editing environment, then the proposition has to be different.
  4. Measure, improve, measure – eCommerce is all about maximizing revenue. The quality of the localization has a strong correlation with the revenue achieved and since the content is never static, measuring, and measuring again is key to maximize returns. As such, be sure to select a platform that will allow you to improve and measure as you go along.

eCommerce content localization is challenging but doesn’t need to be difficult. All that is required is to realize what the needs and challenges are and addressing them in a controlled and structured manner.

For further information on Omniscien Technologies, please visit or contact

To learn more about topics like this, come to Retail Global Las Vegas 2018! Click here to find out more