Text File
fallbackcollator.txt

Fallback Collator

The zope.i18n.interfaces.locales.ICollator interface defines an API for collating text. Why is this important? Simply sorting unicode strings doesn't provide an ordering that users in a given locale will fine useful. Various languages have text sorting conventions that don't agree with the ordering of unicode code points. (This is even true for English. :)

Text collation is a fairly involved process. Systems that need this, will likely use something like ICU (http://www-306.ibm.com/software/globalization/icu, http://pyicu.osafoundation.org/). We don't want to introduce a dependency on ICU and this time, so we are providing a fallback collator that:

Application code should obtain a collator by adapting a locale to ICollator. Here we just call the collator factory with None. The fallback collator doesn't actually use the locale, although application code should certainly not count on this.

>>> import zope.i18n.locales.fallbackcollator
>>> collator = zope.i18n.locales.fallbackcollator.FallbackCollator(None)

Now, we can pass the collator's key method to sort functions to sort strings in a slightly friendly way:

>>> sorted([u'Sam', u'sally', u'Abe', u'alice', u'Terry', u'tim'],
...        key=collator.key)
[u'Abe', u'alice', u'sally', u'Sam', u'Terry', u'tim']

The collator has a very simple algorithm. It normalizes strings and then returns a tuple with the result of lower-casing the normalized string and the normalized string. We can see this by calling the key method, which converts unicode strings to collation keys:

>>> collator.key(u'Sam')
(u'sam', u'Sam')
>>> collator.key(u'\xc6\xf8a\u030a')
(u'\xe6\xf8\xe5', u'\xc6\xf8\xe5')

There is also a cmp function for comparing strings:

>>> collator.cmp(u'Terry', u'sally')
1
>>> collator.cmp(u'sally', u'Terry')
-1
>>> collator.cmp(u'terry', u'Terry')
1
>>> collator.cmp(u'terry', u'terry')
0