8 March 2009

The third style of Django multilingual data handling

In the last month I saw new versions of code for multilingual/translation handling for Django. Current solutions can be grouped into two categories:

  1. The use of two tables -- general nontranslatable data + translatable data as 1:N relation.
  2. Use of additional per language columns for translatable data.

Both solutions are good in many situations and have their own pluses and minuses, but there is another way of doing it that seems less common even when exists in Django from the beginning... use sites M:N relation as main system for multilingual handling.

When the sites solution may be good?

The sites solution is a better option than other two when you want to address some real world requirements that sometimes emerge:

  • Have the choice to hide some data from site version in other language (not to show them in default language when there is no current language). Suppose we have en and de versions of the site, but on de site the entry should not be visible till it will be translated (en should not take its place). This is completely optional -- you can still link the en version on de site if required!

  • Have a possibility to choose what to show on each localized portal: en version may have additional events/promotions/news that will not important for de version.

  • You want a full control about fields that are: translatable, should always be the same (global) and nontranslatable because must be different for reasons that translator may not be aware of (domain specific).

  • Do not have one primary language from where other translations are created. Even when most of the time you translate from en to de, there may be times when you need to do the opposite and in the future update the en when de changes.

  • Handling of territory languages structure, eg. en -> de -> de-at (Austrian German), and remember the structure when changes are needed.

  • Easy addition/removal of new languages (no table alters as in 2.)

  • If interested in read speed improvements give the possibility to remove translation joins (bad thing 1.) and read only needed data (bad thing in 2.)

Are there any side effects?

Yes, there are several tradeoffs, especially:

  • More work on writes (row synchronization).

  • More space needed (global fields are duplicated).

  • More work and logic is needed in CMS/admin part to show the structure in a clear way.

I'm not saying the the sites based solution is good for everything and every case, but I have worked in two multilingual systems (several languages, several sites) and in both it was the only way to avoid the 2xJOINs everywhere mess and have the elasticity that management wanted without sacrificing the general structure.

The reference implementation

Below I'm showing the one of possible implementations of the sites based translations. It is an abstract class that you can use to create translatable models. The system is read effective, writes are costly. In addition current solution do not copy M:N global fields. Maybe in Django 1.1 there will be room for better handling the copy procedure using update() and F() objects.


from django.db import models
from django.db.models.base import ModelBase
from django.contrib.sites.models import Site
from django.utils.translation import ugettext_lazy as _
from django.contrib.sites.models import Site
from django.conf import settings

class TranslationModelManager(models.Manager):
def get_query_set(self):
return super(TranslationModelManager, self).get_query_set().filter(sites__id__exact=settings.SITE_ID)

class TranslationModelBase(ModelBase):
"""
Metaclass for all translation models.
Used to properly set TranslationModelManager and to decode Translation class.
"""
def __new__(cls, name, bases, attrs):
trans_class = super(TranslationModelBase, cls).__new__(cls, name, bases, attrs)

# Decode and copy Translation class.
attr_translation = attrs.pop('Translation', None)
if not attr_translation:
translation = getattr(trans_class, 'Translation', None)
else:
translation = attr_translation
trans_class.add_to_class('_translation', { 'fields': [], 'excludes': [] })

if getattr(translation, 'fields', None) is not None:
trans_class._translation['fields'] = translation.fields
if getattr(translation, 'excludes', None) is not None:
trans_class._translation['excludes'] = translation.excludes


models.Manager().contribute_to_class(trans_class, 'objects')
TranslationModelManager().contribute_to_class(trans_class, 'site_objects')
return trans_class

class TranslationModel(models.Model):
"""
Model for providing per site translations.

Usage:
Subclass it and use class Translation to enumarate translatable fields.
By default all nontranslatable fields are copied from parent to children
and vice versa, but you can use excludes feature to disable this behaviour
for some fields. Below is an example of Translation inner class::

class Translation:
fields = ['title', 'content']
excludes = ['edited_by']
"""
__metaclass__ = TranslationModelBase

sites = models.ManyToManyField(Site)
lang = models.CharField(_('Language'), max_length=5, default=u'en')
source = models.ForeignKey('self', related_name='translations',
null=True, blank=True,
verbose_name=_("Source of translation"),
help_text=_("If not set, this is an independent entry.")
)

class Meta:
abstract = True

def __init__(self, *args, **kwargs):
super(TranslationModel, self).__init__(*args, **kwargs)
# If source is set, copy nontranslatable fields from it also when initializing.
if self.pk is None:
self.copy_nontranslatable_from_source()

def save(self, force_insert=False, force_update=False, pass_going_up=False):
super(TranslationModel, self).save(force_insert=force_insert, force_update=force_update)

# Go up if it wasn't done.
if not pass_going_up:
tip = self
while tip.source is not None:
tip = tip.source
# Cheat source to do the copying.
if tip != self:
tip.source = self
tip.copy_nontranslatable_from_source()
tip.source = None
tip.save(pass_going_up=True)
return

# Propagate to children.
slaves = self._default_manager.filter(source=self.pk)
for slave in slaves:
slave.copy_nontranslatable_from_source()
slave.save(pass_going_up=True)

def copy_nontranslatable_from_source(self, exclude=[]):
"""Copy all nontranslatable fields from source to child entry, overriding old data."""
if self.source is not None:
original = self.source
all_excludes = ['sites', 'source', 'lang'] + \
self._translation['fields'] + self._translation['excludes'] + exclude

for field in self._meta.fields:
field_name = field.name
if field_name in all_excludes or field.primary_key:
continue
setattr(self, field_name, getattr(original, field_name))

Below is an example static pages model created using translational model:


class TranslatableStaticPage(TranslationModel):
url = models.CharField(max_length=100, db_index=True)
title = models.CharField(max_length=200)
content = models.TextField(blank=True)
template_name = models.CharField(max_length=70, blank=True)

class Meta:
ordering = ('url',)

class Translation:
fields = ['title', 'content']
excludes = ['template_name']

Conclusions

The first two mentioned models have its own use cases, pros and cons. Here, I just wanted to show one other way to do the job that in some situations are more elastic than other solutions, but also have its own costs (write and space especially).

0 komentarze:

Post a Comment