Use Unicode UTF-8 for worldwide language support

By | November 28, 2021

Windows NT can date back to 1993, at that time Unicode was still a new thing to the industry. Microsoft had decided to use UCS-2 (pre-W2K) and UTF-16 LE (since W2K) for supporting Unicode and introduced a separated set of Windows APIs (-W APIs) for 2-byte wide char, while keeping the (-A APIs) for Windows 9x programs compatibility.

In contrast, Linux and other modern Unix-like systems use UTF-8 for Unicode support, hence, existing APIs which use 1-byte char can handle both ANSI and Unicode.

This is quite annoying for Windows developers, always have to use MultiByteToWideChar and WideCharToMultiByte to convert between UTF-8, UTF-16 (WCHAR) and other code pages. Now, Microsoft clearly wants to end this by allowing UTF-8 in “ANSI Code Page” (65001), effectively support Unicode in their -A APIs.

Until recently, Windows has emphasized “Unicode” -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.

References:
https://docs.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
https://unicodebook.readthedocs.io/operating_systems.html
https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
https://en.wikipedia.org/wiki/Windows_code_page