Perl Unicode Cookbook: Case- and Accent-insensitive Sorting

℞ 36: Case- and accent-insensitive Unicode sort

The Unicode Collation Algorithm defines several levels of collation strength by which you can specify certain character properties as relevant or irrelevant to the collation ordering. In simple terms, you can use collation strength to tell a UCA-aware sort to ignore case or diacritics.

In Perl, use the Unicode::Collate module to perform your sorting. To sort Unicode strings while ignoring case and diacritics—to examine only the basic characters— use a collation strength of level 1:

 use Unicode::Collate;
 my $col = Unicode::Collate->new(level => 1);
 my @list = $col->sort(@old_list);

Level 2 adds diacritic comparisons to the ordering algorithm. Level 3 adds case ordering. Level 4 adds a tiebreaking comparison of probably more detail than most people will ever care to know. Level 4 is the default.

Previous: ℞ 35: Unicode Collation

Series Index: The Standard Preamble

Next: ℞ 37: Unicode Locale Collation

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub