unicode-data

Provides standardized access to Unicode character data and properties for advanced text processing and internationalization in TeX-based systems.

Overview

Serves as a fundamental resource for handling Unicode characters and their properties in TeX environments, particularly crucial for multilingual document preparation and format file generation.

Contains official Unicode Consortium data files (Unicode 8.0.0) for comprehensive character support.
Includes specialized loader files for character setup and XeTeX character class initialization.
Essential for developers working on internationalization, font support, and advanced text processing features.
Particularly valuable for projects requiring precise Unicode character handling, such as multilingual academic publications or technical documentation with diverse character sets.
Maintained by the LaTeX3 Project as a core resource for the TeX community.

Getting Started

The unicode-data package is a library that provides Unicode Consortium data for TeX use. It is not a LaTeX package that you include directly in your documents, but rather a component used by other packages and formats during initialization.

No special setup is required in your LaTeX documents to use the functionality provided by unicode-data, as it works behind the scenes in your TeX distribution. The data and loaders are primarily used when building format files or by other packages that need access to Unicode character properties.

Examples

Loading Unicode character data for use in a LaTeX document with XeTeX engine.

\documentclass{article}
\usepackage{amsmath}
\usepackage[utf8]{inputenc}
\usepackage{textcomp}
\usepackage{amssymb}
\usepackage[greek,english]{babel}
\usepackage{CJKutf8}
\begin{document}
% This example demonstrates a document that uses the unicode-data package
% which is primarily used behind the scenes for format building
% and character classification in Unicode-aware engines

% The package is mostly used internally by other packages and formats
% rather than directly in documents

This document uses the unicode-data package which provides access to
Unicode Consortium data. The package is primarily used by format builders
and for setting up character classes in XeTeX and LuaTeX.

Some Unicode characters that benefit from proper classification:
\begin{itemize}
  \item Greek letters: $\alpha$ $\beta$ $\gamma$
  \item Mathematical symbols: $\sum$ $\int$ $\prod$
  \item Various scripts: Cyrillic (\textit{Privet}), Arabic (\textit{marhaba}), CJK (\begin{CJK}{UTF8}{min}你好\end{CJK})
\end{itemize}

\end{document}