When supplementary characters are involved, a supplementary is counted as two UTF-16 code units using CODEUNITS16, or one UTF-32 code unit using CODEUNITS32.
Unicode defines character encodings in three distinct sizes-utf-8, UTF-16, and utf-32-while the traditional character type is 8 bits.
使用增補字元時,對於一個增補字元,使用CODEUNITS16 計算是兩個UTF-16程式碼單元,而使用CODEUNITS32 計算則是一個UTF-32程式碼單元。
However, if you use UTF-16, the size of the original document roughly doubles and the document takes longer to parse.
比方說,如果UTF-16資料原樣載入到C字串中,字串可能從第一個ASCII字元的第二個位元組截斷。
UTF-16LE: 16-bit UCS Transformation format, little-endian byte order.
UTF-16BE: 16-bit UCS Transformation format, big-endian byte order.