(Incomplete) Unicode in C++¶
A Crash Course in Unicode for C++ Developers - Steve Downey - [CppNow 2021]
Best text handling is not to
std::u8string
is not Pythonstr
utf-16
: originally they though 16 bits would be enough for everyone (someone also said 640K would be enough)still used as internal Windows and Java representation
utf-32
never used, but often used internally to represent code pointsNormalization: comparing two texts that are the same but under a different representation
Bad: there is one code point to represent “Ä”, but it can alos be represented by “A” and one “modifier” that puts two points over preceding character.
Applying the Lessons of std::ranges to Unicode in the C++ Standard Library - Zach Laine CppNow 2023