Quality estimation: taking the guesswork out of machine translation

As businesses tighten their belts, and pressure mounts to reach a global audience faster, you may be considering using automated translation tools for your multilingual content. The big question is, can you trust the quality?

You don’t want to risk your reputation by putting out content that’s inaccurate, confusing, or off brand. At the same time, if every machine-translated word needs to be reviewed by a human linguist, you’ll soon wipe out the time and cost savings of automation.

Quality estimation offers a way forward: by automatically predicting the quality of machine translations, it helps us assess whether the standard will be acceptable and where human input may be required. This is key to Rubric’s approach of balancing the complementary strengths of humans and AI, to make your localization budget work harder.

What is quality estimation?

Quality estimation is not the same as quality evaluation, which is where humans review and assess completed machine translation outputs against specific guidelines (for example, adherence to agreed terminology and style, as well as accuracy and readability). Quality estimation (QE), in contrast, is a forward-looking model that relies on machine learning to predict potential translation issues.

QE automatically reviews text translated by machine translation (MT) tools or LLMs (large language models) and gives it a quality score, based on how well the machine has performed against various criteria. These can include accuracy, fluency, readability, and custom factors such as how particular terminology and brand terms are handled.

Scoring can help determine whether content is suitable for MT. In some cases, low scores can indicate a problem with the source content—it may be ambiguous, inconsistently worded, or poorly formatted. Rubric can help you optimize your content to be more MT-friendly. QE scores are also helpful in comparing MT engines and selecting the right one for the job. Different tools have different strengths, and performance varies by language pair, text type, and subject area. (In a recent case, we selected four different MT engines across a client’s 10 target languages.)

QE is also an important part of the quality assurance process, highlighting potential risks that can be mitigated by human oversight.

Pros and cons of quality estimation

By reducing manual effort in the localization process, QE makes it possible to scale your global content within your budget. It provides a level of confidence in MT outputs, drives continuous improvement, and targets human intervention where it’s really needed, maximizing value from your localization spend.

But QE is not perfect—after all, it only provides an estimation. However advanced the machine learning algorithms, they can only follow the rules and patterns they’ve been trained on. They cannot replicate human curiosity and insight, and scores can lack nuance. So, while QE provides a useful indicator, we don’t rely on it alone for quality control.

Rubric’s approach to quality estimation

At Rubric, we tailor our approach to every client and project, making sure you get the best possible results within your timeframe and budget. We spend time understanding your content, how it’s used, and what quality level your customers really need. Our priority is always to reduce risk, and we do that by balancing automation with human expertise.

Where content is highly visible or heavily used—or where it has specific legal, technical, or brand significance—we class it as “high risk”. In this case, we combine automated QE with human review for a more rounded picture of MT quality. Depending on time and budget, we may recommend human translation for high-risk elements, or have the MT outputs fully post-edited.

For lower-risk content, where conveying the general meaning is enough, our RubricCatcher tool highlights inconsistencies or anomalies in translation. We can customize reporting to flag specific issues relevant to your content, ready for expert review.

For example, depending on your style of content, we can configure RubricCatcher to:

check that references to UI strings are used accurately and consistently across your help documentation
detect inconsistencies in formatting tags (in HTML or MadCap Flare, for instance), which most MT engines still struggle with
check your URLs for consistency against a database of multilingual URL strings

Technology with a human touch

As well as increasing efficiency for you, our QE approach improves our translators’ experience—something that’s very important to us at Rubric. We want to empower linguists to work with technology as it evolves. With QE, translators don’t have to review large volumes of MT outputs—instead, they can apply their creativity and cultural insight to solving genuine problems for your customers.

As your localization partner, we’re completely transparent about our use of automation. We only use MT or LLMs after discussing and agreeing it with you, and always with human oversight. We constantly measure the quality of our translations, aiming to improve with each delivery, and we openly discuss any challenges and proposed solutions with you. QE provides a baseline and enables us to customize tools for superior results.

Combining human expertise with AI is the foundation of Rubric’s Assured AI solution, built to help you automate translation with confidence. Contact us to get more for your localization budget, without gambling on quality.

Industries

Content

Resources

Company overview

Quality estimation: taking the guesswork out of machine translation

What is quality estimation?

Pros and cons of quality estimation

Rubric’s approach to quality estimation

Technology with a human touch