Dominic Spurling, Author at Rubric

Dominic Spurling
October 21, 2019

Traditional content management systems, which first emerged in the 1990s, are complete, vertically-integrated systems that create and publish content. They were developed to provide a way to create and maintain websites without needing to hand craft the HTML code for every page. Systems like WordPress and Drupal are still hugely popular today and have an important role to play in content management.

The basic organizational unit in a traditional CMS is a web page. While CMSs employ different strategies for managing information, presentation rules and interactivity, all traditional CMSs are capable of delivering web pages at their front end.

By contrast, a headless CMS ignores responsibility for presentation and interactivity and deals only with the content itself. The front end (head) is simply missing, hence “headless”, and needs to be complemented by separate systems which manage publication to various channels: web, app, digital display, print etc.

For authors familiar with traditional CMSs, having to work with content devoid of aesthetic features can seem counterintuitive and a bit of a pain. So, why do it?


Benefits of Headless CMS
  • Content reuse:
    • Organizations often choose a headless CMS when reusability of content is a key consideration. This could mean delivering to multiple channels: apps, websites, print, digital signage, voice activated devices (e.g. Alexa, Google home) or 3rd party applications.
  • Localization:
    • Reusability could also mean delivering content in several languages. Authoring in a system focused on maintaining the integrity and consistency (low entropy) of content structures makes the process of creating localized versions cleaner and easier to manage.
  • Manageability and automation:
    • In a headless approach, we create content without necessarily knowing much about the channels through which it will eventually be published. Similarly, front ends see the back end as a black box and don’t have any control over the authoring process. It is important that the content includes enough metadata so that front ends can automatically make good decisions about how to present each element. Creating content with a rich semantic structure in this way makes it more machine readable, which is beneficial for SEO, analytics, personalization, indexing, caching, syndication, localization or any other application where automated processing is important.
  • Agility:
    • Another common reason for going headless is agility. Many organizations find themselves stuck with a traditional CMS that was developed years or even decades ago and this constrains them from taking advantage of new technologies, particularly at the front end.
    • New front end channels, frameworks and tools tend to emerge more quickly than developments in the world of content repositories and databases. The bar for UX nowadays is set very high by companies such as Google, Facebook and Netflix – consumers expect you to keep up.You may also find it easier to recruit developers if you can offer them the chance to get their hands on some of these new frameworks. By separating front end and back end, each can evolve at its own page and developers can focus on their area of specialism without impacting each other too much.


Potential Pitfalls of Headless
  • Upfront costs:
    • A headless approach typically requires a larger initial investment than a traditional CMS. Since a headless CMS only provides a back end, it’s essential to build separate front end applications to present content to end users.
    • However, in the long term, development and maintenance costs are likely to be lower, particularly in a large-scale implementation with multiple channels.
  • Content preview for authors:
    • In a traditional CMS, the authoring system is tightly coupled to the publishing system and it is therefore easy to present authors with a preview of what their content will look like when published (WYSIWYG).
    • In a headless environment, authors are expected to create content without knowing exactly how it will look. This issue can be resolved to some extent by fitting the headless CMS with a “default front end” which allows authors to preview their content in at least one channel. This is sometimes referred to as a hybrid or decoupled CMS approach.
  • Handling complex presentation:
    • The more elaborate the content element, the harder it will be to separate structure from presentation. Examples of this include graphics, tables and scientific notation. There are tools available to help – for example LaTex provides a solution for scientific documentation. In the case of graphics, vector formats like SVG are likely to be more suitable than bitmaps, where the presentation is fixed and text is embedded in a way that make it difficult to adapt.
  • Link Management:
    • A traditional CMS deals in web pages and knows the URLs under which they will be published. Because of this, it’s easy for authors to link from one page to another by inserting these URLs into their pages.
    • By contrast, a headless CMS needs to be able to work independently of the front end(s) that it powers. There may be more than one web front end, publishing the same content under different URLs for different purposes (e.g. internal knowledgebase and external help site).
    • A similar issue may arise if you publish to two channels with very different requirements. For example, when writing for print, you might say “please refer to page #” but this doesn’t work if the same content is published to the web.
    • There will be many possible strategies for dealing with these issues and it’s likely to be something you need to consider as part of your implementation.


Footnote – Separation of Content and Presentation

A headless system forces us to design the structure of our content independently of presentational considerations, a strategy known as the separation of content and presentation.

Separation of content and presentation is a specific application of a general principle known as the separation of concerns in the field of software development and systems design.

The goal of “separation” is typically driven by the desire to make content (or IT systems) more maintainable and adaptable to different use cases. In short: efficiency.

In the case of a formatted technical document, elements which impart meaning (semantics) are generally considered to be structural (part of the content itself) and elements which apply aesthetic styling to published output are considered presentational.

The separation of these two concepts is an ideal or aspiration rather than a strict set of rules and this can lead to confusion and disagreement when trying to come up with the “right” answer to a particular implementation challenge. There is rarely a completely neat way to decide what is content and what is presentation.

A classic example of this is a page break in a DITA document that is to be presented both in print and online. On one hand it is structural (it denotes an endpoint), but it only becomes meaningful when the content is presented as a set of printable pages. Websites usually dispense with the notion of page breaks, in favor of continuously scrolling text.

The apparent contradiction can be resolved if we apply a bit of common sense. We need to put some kind of signifier into the content structure (let’s call it a section break) which tells the print publication system to “put a page break here”. A web publishing system might interpret this as a good place to insert an ad.

By calling it a section break, rather than page break, we have managed to keep our content structure free from “presentational elements”, while keeping control over pagination of printed outputs.



Given the pros and cons, when is a headless CMS the right solution? If you are implementing a small-scale website whose content changes slowly over time, a simple “brochureware” site for example, then a traditional CMS is likely to be the best fit. Authors have maximum control over the look and feel and can tailor the user experience to the intended audience.

If you are working at scale with content that will be valuable in a variety of channels, then a headless CMS could be the way to go. It will help you to establish and maintain good practices for creating well-structured content that is easy to repurpose and lends itself to automated processing. The benefits of reduced maintenance costs and increased content leveraging (and resulting reductions in localization cost) in the long term should offset the higher up-front costs.

For more insights from our experts, be sure to subscribe to the Rubric blog to discover other ways to optimize your global content strategy.

Dominic Spurling
April 24, 2019

From a software engineering perspective, the localization process can be an entropy-increasing stage in your devops pipeline.

Localization tools need to extract a snapshot of the user experience, usually from resource files, and generate translated equivalents without adversely affecting the integrity of the application. User interface strings must be unpicked from (sometimes deeply nested) mark-up and presented to translators, who prepare target language strings, which must be ready to nest back into place within identically structured mark-up.

The tendency for small inconsistencies in the source to become large ones in target language files and for non-breaking anomalies to become breaking ones – this is entropy in UI projects.

At Rubric, we use a mix of automated tests and manual checks by both linguists and engineers, to help minimize this effect. Below I’ll work through a typical example to show how you can help your global content partner by minimizing entropy at the start of the process. (Look out for the inconsistencies in the original source.)

An example resource file

The following XML is based on a typical resource file for an Android app:

		<![CDATA[Check your mobile device’s Wi-Fi settings and make sure your mobile device is connected to your home network##REPLACE_WITH_HOME_NETWORK##.<br /><br />Or, if you still can't connect, click START OVER.]]>
		<![CDATA[We&rsquo;re here to help]]>
		<![CDATA[How would you like to connect your speaker to your network?]]>


Step 1 – Identify content type and unwrap nested formats

The file is first put through an Android Strings XML parser to extract the value of each key. Content type within CDATA sections (HTML) is identified and handed off to a secondary parser

  • Note: there are two right single quotation marks, highlighted in yellow. One of them is HTML encoded as &rsquo; but the other is a literal character. This is an example of an inconsistency, which could lead to problems down the line.

Step 2 – Parse HTML and protect tags and placeholders

Here the Entities are decoded (second key) and HTML tags and application-specific placeholders are protected.

Step 3 – Present translatable strings to translators

Translations are pre-populated from translation memory where possible and the translator fills any gaps which remain. The placeholders shown in purple cannot be altered by the translator but may be re-arranged if required by the sentence structure of the target language.

Step 4 – Write out target files

This is often the most technically complex part of the process where inconsistencies in the source can become amplified. The translated segments are processed (through each of the above steps in reverse), eventually reconstituting the original format.

First, placeholders and tags are re-injected and special characters are re-encoded or escaped:

The escaped single quote will probably not do any harm if it is decoded at right points down the line in your devops pipeline. However, if the structure source is internally consistent (less entropy!) this kind of ambiguity can be avoided.

Finally, the translated strings are re-injected into the original markup:

    <![CDATA[Vérifiez les paramètres Wi-Fi de votre périphérique mobile pour vous assurer que ce dernier est connecté à votre réseau domestique##REPLACE_WITH_HOME_NETWORK##.<br /><br />Si vous ne pouvez toujours pas vous connecter, cliquez sur RECOMMENCER.]]>
    <![CDATA[Nous sommes là pour vous aider]]>
    <![CDATA[Comment souhaitez-vous connecter l&rsquo;enceinte à votre réseau?]]>


How you can help your Global Content partner

As well as providing source files which are structured in a consistent way, there are a couple of other ways in which you can help optimize the localization process and enhance the quality of the end product:

  • Provide a complete set of files with every localization request

    At Rubric, we typically run diff reports at the end of every localization project in order to review changes in the English source and compare those against changes in the target files. This helps us to pick up any unexpected changes (for example, escaped characters introduced in error). Working with a complete set of files for each revision simplifies the diff process and makes reports easier to analyze.

  • Say something when you find anomalies

    If you find that you are having to apply fixes to localized resource files, please tell your Global Content partner, as this will enable them to correct any misconfigurations.

*first image of a black hole courtesy of the Event Horizon Telescope (EHT) network.

Follow Our Activity

Stay up to date with our latest activity relating to Global Content.