UNIC: Unicode and Internationalization Crates for Rust

UNIC is a project to develop components for the Rust programming language
to provide high-quality and easy-to-use crates for Unicode
and Internationalization data and algorithms. In other words, it's like
ICU for Rust, written completely in Rust, mostly
in safe mode, but also benefiting from performance gains of unsafe mode when
possible.

Project Goal

The goal for UNIC is to provide access to all levels of Unicode and
Internationalization functionalities, starting from Unicode character
properties, to Unicode algorithms for processing text, and more advanced
(locale-based) processes based on Unicode Common Locale Data Repository (CLDR).

Other standards and best practices, like IETF RFCs, are also implemented, as
needed by Unicode/CLDR components, or common demand.

Project Status

At the moment UNIC is under heavy development: the API is updated frequently on
master branch, and there will be API breakage between each 0.x release.
Please see open issues for changes
planed.

We expect to have the 1.0 version released in 2018 and maintain a stable API
afterwards, with possibly one or two API updates per year for the first couple
of years.

Design Goals

Primary goal of UNIC is to provide reliable functionality by way of
easy-to-use API. Therefore, new components are added may not be
well-optimized for performance, but will have enough tests to show
conformance to the standard, and examples to show users how they can be
used to address common needs.

Next major goal for UNIC components is performance and low binary and memory
footprints. Specially, optimizing runtime for ASCII and other common cases
will encourage adaptation without fear of slowing down regular development
processes.

Components are guaranteed, to the extend possible, to provide consistent
data and algorithms. Cross-component tests are used to catch any
inconsistency between implementations, without slowing down development
processes.

Components and their Organization

UNIC Components have a hierarchical organization, starting from the
unic root, containing the major components. Each major component, in
turn, may host some minor components.

API of major components are designed for the end-users of the libraries, and
are expected to be extensively documented and accompanies with code examples.

In contrast to major components, minor components act as providers of data and
algorithms for the higher-level, and their API is expected to be more
performing, and possibly providing multiple ways of accessing the data.

The UNIC Super-Crate

The unic super-crate is a collection of all
UNIC (major) components, providing an easy way of access to all functionalities,
when all or many are needed, instead of importing components one-by-one. This
crate ensures all components imported are compatible in algorithms and
consistent data-wise.

Main code examples and cross-component integration tests are implemented under
this crate.

Applications

Code Organization: Combined Repository

Some of the reasons to have a combined repository these components are:

Faster development. Implementing new Unicode/i18n components very often
depends on other (lower level) components, which in turn may need
adjustments—expose new API, fix bugs, etc—that can be developed, tested and
reviewed in less cycles and shorter times.

Implementation Integrity. Multiple dependencies on other components
mean that the components need to, to some level, agree with each other.
Many Unicode algorithms, composed from smaller ones, assume that all parts
of the algorithm is using the same version of Unicode data. Violation of
this assumption can cause inconsistencies and hard-to-catch bugs. In a
combined repository, it's possible to reach a better integrity during
development, as well as with cross-component (integration) tests.

Pay for what you need. Small components (basic crates), which
cross-depend only on what they need, allow users to only bring in what they
consume in their project.

Shared bootstrapping. Considerable amount of extending Unicode/i18n
functionalities depends on converting source Unicode/locale data into
structured formats for the destination programming language. In a combined
repository, it's easier to maintain these bootstrapping tools, expand
coverage, and use better data structures for more efficiency.

License

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in the work by you, as defined in the Apache-2.0 license, shall be
dual licensed as above, without any additional terms or conditions.