First questions to ask before training your LLM: Strategy and readiness

Are you frustrated with AI translation tools not giving the results you need? Are you spending valuable time and resources fixing AI outputs to align with your brand?

It could be time for a machine translation (MT) solution that’s customized for your business. But training an LLM (large language model)—or developing your own—is not something to undertake lightly. You need to be sure your business is ready, and that you have a clear strategy in place that will deliver measurable results.

We’ve pulled together some initial questions to help you think through what’s involved in training your own LLM, and make an informed choice about whether this would benefit your business.

But first, let’s set the scene.

What is an LLM?

LLM stands for “large language model”: it’s an AI program that has been “trained”—by analyzing very large quantities of existing text—to process, interpret, and generate human language. Using “deep learning” to identify patterns and make connections across the billions of words they’ve encountered, LLMs can answer questions, draft text, or propose translations.

But, while LLMs are becoming increasingly powerful, their outputs are only as good as the content they’ve been trained on.

Benefits and challenges of training your LLM

Generic LLMs won’t be familiar with your internal terminology and brand voice, so translations can require lengthy post-editing. Training your LLM on your own content can produce higher-quality results. And having your own, in-house model offers greater control and data security.

But not every organization is ready to build or train its own LLM. It’s a significant investment requiring specialist skills and ongoing maintenance. This means it’s crucial to understand the long-term time, cost, and resource implications.

Here are five questions to consider before you jump in.

1. What problem are you trying to solve?

Is your goal to improve translation quality or increase automation? Cut your costs or speed up time to market? Most likely, it’s a combination of these—and, while a custom LLM could help, it’s not the only solution.

2. Do you have the right data, and enough of it?

Good results depend on quality training inputs. How much data do you have, and what’s the quality like? It needs to be accurate, clear, and unambiguous. It should also be balanced and diverse, so translations avoid bias. And you’ll need to ensure data is clean, without distractions like typos, placeholders, missing or duplicate text, or graphics.

A well-structured and up-to-date glossary for every target language is essential: it defines the correct terminology to use in different scenarios, so users get a consistent experience.

Rubric can help prepare your training data, including custom glossaries, to give your LLM the best chance of success.

3. What level of customization do you need?

There are various options for customizing your LLM, with differing levels of complexity.

Prompt engineering is a light-touch approach, where requests are refined to generate better responses.
Fine-tuning gives generic tools a helping hand, for example by adding custom glossaries, so the outputs align better with your brand.
Domain adaptation and full customization are more resource-intensive options that aim to retrain the model on your own data set.

We can help you identify the best approach based on your needs and resources.

4. What are the true costs?

Don’t let spiraling customization costs outweigh the savings of AI translation. It’s not just the initial cost of training your model. Whether you’re doing the work in house or outsourcing it (more on this decision later) you’ll need to factor in costs for ongoing usage, maintenance, evaluation, continuous improvement, and staff training.

5. How will you evaluate quality, measure success, and drive improvement?

Will your AI translations be subject to full human review? Do you have the language skills and capacity to do that in house? Or will you rely on automated checking tools and spot checks?

It’s important to define from the outset what constitutes “acceptable” quality: a consistent, objective scoring system will prevent bias and personal preference. Work out the process you’ll follow to evaluate quality and implement improvements.

We have the tools and expertise to help you set a quality baseline, so you can select the right LLM for your needs and demonstrate measurable improvements.

If you’ve answered these questions and feel confident that training an LLM is worth the investment for your business, Rubric is here to help. Read our follow-up blog on execution and impact to discover your next steps, or talk to us for tailored guidance and support.

Industries

Content

Resources

Company overview