Training a Neural Machine Translation (NMT) Engine for League of Legends

Introduction

Imagine diving into the dynamic world of League of Legends, where each patch brings thrilling updates and strategic shifts. Now, envision the challenge of translating these detailed patch notes for a global audience, ensuring every player, no matter their language, stays in the loop.

Welcome to our Machine Translation (MT) pilot program. Alongside my teammates, I am embarking on a project to train a neural machine translation (NMT) engine specifically for translating League of Legends patch notes from English into Korean. Our goal is to push the boundaries of NMT technology to see if we can achieve high-quality translations with minimal human intervention.

Through this pilot, we seek to understand the complexities of game-specific terminology and the nuances of patch note content. By training and refining our NMT engine, we aim to reduce the reliance on human translators, speeding up the process and maintaining the excitement and clarity of each update. Join us as we explore the potential of NMT to revolutionize game localization and enhance the player experience for millions of League of Legends fans.

Pilot Project Proposal

You can view the PDF of the proposal here.

Our initial proposal includes a detailed plan for the MT pilot project, outlining the objectives for quality, timing, and pricing. It specifies the project timelines and processes required for training the MT engines, provides detailed information about the datasets, and includes the expected quotes for the project.

Project Process

File Preparation

1. Developing Crawlers to Extract Text for Translation

Thanks to a successful collaboration between one of my teammates and a skilled software engineer, we developed crawlers to extract text from the League of Legends website. This innovative automated tool efficiently gathers text content from web pages, converting it into TXT files ready for translation. The use of crawlers not only streamlined the extraction process but also significantly reduced the time required to prepare the text for our translation efforts. With our TXT files ready, we converted them into TMX format to align them using TMX editors.

2. Data Cleaning and Aligning

The process of data cleaning and aligning was crucial to ensure the quality of our translations. We used TMX editors like Wordfast AutoAligner and Okapi Olifant to align the extracted text. Despite the capabilities of these tools, the alignment process was time-consuming and required significant manual effort to achieve accurate results.

To clean the data, we employed simple Regex to remove unwanted elements and ensure consistency. We also deleted extremely long sentences that could compromise the translation quality. These steps were vital to prepare clean, well-aligned datasets for training our NMT engine, ultimately enhancing the efficiency and effectiveness of our translation process.

First Round of Training & Post-Editing

Finally, we are ready for our first MT training. To compare performance, we tested both Microsoft Custom Translator and Systran. After evaluating both models, we decided to proceed with Systran due to its higher BLEU scores and user-friendly interface.

Next, it was time for post-editing machine translation (PEMT). We assigned three team members to conduct human evaluations and tracked the word count they post-edited in a two-hour period. To ensure consistent quality assessment, we used the LISA quality model.

Iterative MT Training (Systran)

We embarked on a series of 10 training rounds with Systran. After each round, we huddled together to plot our next move, all in the name of boosting those BLEU scores. Our tactics varied from adding more data to the test dataset, to improving our cleaning and aligning processes, and even to using only the crème de la crème of relevant data.

We started strong with relatively high BLEU scores, which set our expectations sky-high. But, to our surprise, more data didn’t always translate to better results. In some cases, things got messy, and the quality scores actually dropped.

We experimented with every trick in the book to train our MT engine. Along the way, we learned that quality data and meticulous refinement are the real magic ingredients.

Final PEMT and Human Evaluation

After the final round of MT training, we rolled up our sleeves for another round of PEMT and human evaluation. To keep things consistent, our trusty team of evaluators was back at it. We wanted to see just how efficient our MT engine had become, so we crunched the numbers on how many more words we could edit post-training.

And the results? Drumroll, please… The quality of the MT engine soared! We can now save a ton of time and money on translations.

Want the juicy details? Our updated proposal has all the good stuff: the final BLEU scores, the cash we’ll save, and the hours needed for the full training. Intrigued? Check it out here!

Enjoy our video presentation!

TMS (Translation Management System) Improvement Proposal for Bloodify

Introduction

What is a Translation Management System (TMS)?

A Translation Management System (TMS) is designed to streamline and manage the localization and translation of content at scale. It helps businesses handle the complexities of translating large amounts of content into multiple languages and dialects, ensuring consistency and quality. A TMS facilitates collaboration by organizing and managing translated assets effectively. It reduces manual work, improves turnaround times, and ensures high-quality translations—essential for global businesses.

As a simulated project, I chose to provide TMS-related consulting to a fictitious company named Bloodify. Bloodify offers a music and video streaming service positioned between Spotify and YouTube. With Bloodify’s rapid global growth, enhancing our TMS with Smartling will boost productivity and streamline localization.

Let’s explore Bloodify’s key business requirements, current challenges, and how Smartling can bridge these gaps to support its expansion.

Bloodify’s Key Business Requirements

  1. Comprehensive Content Management: Bloodify, a music and video streaming platform, hosts diverse multimedia content including music, podcasts, videos, and app interfaces. A TMS should streamline the translation of all this content into 62 languages.
  2. Consistency: Maintaining a consistent tone and style across millions of tracks and thousands of playlists and podcast summaries is crucial. A TMS should ensure this using translation memory and glossaries.
  3. Efficiency and Speed: Frequent updates require quick localization turnarounds. A TMS should automate the translation workflow, reducing time-to-market for new content and features.
  4. High-Level Automation and Customization: The TMS should offer high-level automation and customization to increase productivity and reduce time and costs in the localization process.

Problems with the Current TMS

  1. Inconsistency in Quality:
    • Using different vendors and external linguists without a unified quality assessment platform leads to inconsistencies.
  2. Delayed Feedback:
    • The reliance on weekly reports delays the feedback loop, hindering timely improvements.
  3. Limited Automation:
    • The current TMS setup lacks automation, resulting in extensive manual data handling.

Proposed Solution: Implementing Smartling

To address these challenges, I recommend implementing Smartling. Based on my research, Smartling offers a range of features and benefits that will significantly enhance Bloodify’s localization processes.

Key Features of Smartling:

  1. Cloud-Based Platform:
    • Centralized system accessible from anywhere, ensuring all team members can collaborate in real-time.
  2. Real-Time Collaboration:
    • Immediate feedback and collaboration between translators, editors, and project managers to improve efficiency.
  3. Automated Workflows:
    • Reduces manual tasks by automating the translation process, from content submission to final delivery.
  4. In-Context Translation:
    • Provides translators with context for each piece of content, ensuring accurate and culturally relevant translations.
  5. Translation Memory and Glossaries:
    • Maintains consistency by reusing previously translated content and standardizing terminology.
  6. Robust Reporting and Analytics:
    • Detailed insights into translation quality, project progress, and performance metrics.
  7. API Integration:
    • Seamlessly integrates with other tools like CMS, JIRA, Tableau, and machine translation engines to streamline workflows.

Benefits of Implementing Smartling:

  1. Improved Quality and Consistency:
    • Unified platform for quality assessment ensures consistent translations across all content.
  2. Faster Turnaround Times:
    • Real-time collaboration and automated workflows significantly reduce the time required to localize content.
  3. Reduced Manual Work:
    • Automation minimizes manual data handling, freeing up resources for more strategic tasks.
  4. Enhanced Productivity:
    • Efficient processes and advanced features increase overall productivity, allowing for quicker updates and releases.
  5. Cost Savings:
    • By automating tasks and improving efficiency, Smartling helps reduce the costs associated with the localization process.
  6. Scalability:
    • Smartling’s robust infrastructure supports Bloodify’s rapid global growth, easily accommodating increasing volumes of content.

By implementing Smartling, Bloodify will overcome the limitations of its current TMS, ensuring high-quality, efficient, and cost-effective localization.

Conclusion

You can view the PDF of the presentation here.

In this proposal video, I analyze Bloodify’s translation management challenges and propose Smartling as the solution to enhance productivity and streamline localization processes. Watch the video to see how Smartling can support Bloodify’s rapid global growth.

Team CAT Project: Otter Awareness Month

Introduction

At MIIS, our approach to learning is hands-on, especially in the Translation Technology course, where we practice through projects that mirror the industry’s real-world demands. As the culmination of our studies, we formed a team under the banner of Amazing Five Translation. We embarked on a project that would see us partnering with Monterey Bay Aquarium during Otter Awareness Month, translating vital campaign materials. Our mission was to adapt these materials for an international audience, ensuring the message was as impactful and engaging as the original. Amazing Five took on the task, providing translations in Chinese, Japanese, Korean, and Portuguese, and managing the workflow with professional tools like SDL Trados, from the drafting of our proposal to the final delivery of our work.

Statement of Work (SOW)

The SOW for our “Otter Awareness Month” project captures the essence of AFT’s translation services.

The document meticulously lays out the workflow stages from preparation, production, and finalization stages. It also outlines communication protocols, pricing, and payment details, while also profiling the expert linguists and tools engaged in the project. It serves as a comprehensive guide for ensuring our client is fully informed and played a crucial role in securing the green light for our project during the initial kick-off meeting with the client.

Kick-Off Meeting with the Client

After meticulously crafting our proposal, our team had a kick-off meeting with the client, affectionately known as Prof. Wooten. He played the role of a client unfamiliar with translation work and we thoroughly guided him through our plan, and while he seemed impressed, he suggested having a minimally bilingual employee review our work, challenging us to defend our professional integrity.

As professionals, we were confident in our skills, but this mock scenario was a lesson in client dynamics. We advocated for our expertise, yet learned that clients often seek external opinions. Having spent a good amount of time in the language industry, I am still discovering new aspects to learn. This project was a brilliant simulation, offering valuable insights into client relations and real-world challenges in translation.

Deliverables

Leveraging our amazing teamwork, we successfully delivered all the required materials to our client on schedule. The deliverables consisted of thoroughly translated documents, Translation Memory, glossaries, and pseudo-translations for Chinese, Japanese, Korean, and Portuguese, ensuring a comprehensive and well-rounded translation package.

Future Improvements

Reflecting on our group translation project, I recognized key improvement areas for more professional and effective project management in real-world scenarios. The significance of having a dedicated project manager became apparent, especially for larger projects. A project manager could optimize workflow in the CAT tool, improving the use of translation memory and glossary management. I also realized the importance of a task tracker such as JIRA to streamline communication, as relying solely on Teams chat proved inefficient. These realizations have equipped me with a deeper understanding of how to enhance both our management and communication strategies for future translation projects.

Reflections

Regular Expressions for CAT tool

Helpful rules for EN-KO translation on Trados

When localizing to a different language, it is essential to adapt the text for the target audience. Using the proper typographic conventions, such as curly quotes, aligns the translated content with the local language’s standard writing style.

1. Rule: Switching from ” ” to “ ” (from straight quotes to curly quotes) 따옴표 바꾸기

Find: “([^”]*)”

Replace: “$1”

This RegEx rule is designed to find text enclosed within straight quotation marks (“). It then replaces the found text with the same content, enclosed in smart quotes or curly quotes (“ ”) for enhanced typographic presentation.

Switching from straight quotes to curly quotes in translations from English to Korean is crucial for several reasons. Curly quotes enhances the visual appeal and professionalism of the text, ensuring cultural and linguistic appropriateness in the target language. Consistency in using curly quotes maintains text cohesion, preventing distractions and ambiguity, especially in technical or legal content. Overall, this transition improves quality, readability, and cultural relevance in the translated material.

2. Rule: Removing cent sign (two decimal digits after the decimal point)

Find: \.\d{2}(?=\D|$)

Replace with:

You can apply this regular expression to locate instances of two decimal digits (decimal places) following a dot and remove them.

In Korea, we typically do not use two decimal places after the dot. Removing them during translation can be quite cumbersome. However, with the help of this RegEx, we can significantly enhance our productivity and save valuable time.

3. Rule: Changing English date format to the Korean date format.

Find: \d{4}[-/.]\d{2}[-/.]\d{2}

Replace with: YYYY-MM-DD

When translating content, you might encounter dates written in various formats, including the typical English format, which is MM/DD/YYYY or DD/MM/YYYY. However, in Korea, the standard date format is YYYY-MM-DD. To ensure that the translated content adheres to the Korean date format, this regular expression can be applied.

4. Rule: Identifying incorrect spacing

Find: [\s][.,?!]+[\s]

This regular expression is for identifying and locating spacing errors. Overall, this regular expression is used to find instances in the text where a punctuation mark (period, comma, exclamation mark, or question mark) is surrounded by whitespace, ensuring that there is a space before and after the punctuation. This will allow you to make adjustments if needed.

Translation Technology

From Babel to AI: The Future of Human Translators in the Age of Advanced Translation Technology

Charting the future of translation: A vibrant depiction of technology's role in the language industry.
Charting the future of translation: A vibrant depiction of technology’s role in the language industry.

What exactly is translation technology?

The quest to understand different languages dates back to ancient times, with the story of the Tower of Babel often symbolizing the genesis of linguistic diversity. Traditionally, translation relied heavily on individual translators, with the quality of translation directly linked to the skill of these linguists.

Today, modern translation is intricately intertwined with technology, aiming to make the process faster and more efficient. This technological evolution in the field of translation includes tools like Computer-Assisted Translation (CAT), Machine Translation (MT) software, and Translation Management Systems (TMS).

One of the first questions that crossed my mind when I began the “Translation Technology” course at the Middlebury Institute of International Studies at Monterey was: Will these technologies replace human translators, or will they become invaluable allies? This course, led by Professor Adam Wooten, started with this intriguing query and guided us through a comprehensive exploration of how to use technology both effectively and ethically. We explored the strengths and limitations of translation technology, understanding its goals and capabilities.

From the pioneering IBM-Georgetown machine translation experiment in 1954 to the advent of Large Language Models and Generative AI, we examined how these technologies are applied in the world of translation. Join me as I share some key insights and takeaways from this enlightening class.