Following How to make ICU stop early when meeting 65533?, I would like to use a self-defined callback to stop early when it meets PUA characters.
For example, 0xA8 0xC1
is not defined in GB2312, and the conversion by ICU will be 57465 (U+E079), which belongs to private user area.
The code below is generated by ChatGPT, but it fails to work:
bool isPrivateUse(UChar32 ch) {
return (ch >= 0xE000 && ch <= 0xF8FF) ||
(ch >= 0xF0000 && ch <= 0xFFFFD) ||
(ch >= 0x100000 && ch <= 0x10FFFD);
}
void U_CALLCONV stopOnPUACallBack(const void *context, UConverterToUnicodeArgs *toUArgs,
const char *codeUnits, int32_t length, UConverterCallbackReason reason,
UErrorCode *pErrorCode) {
if (reason <= UCNV_IRREGULAR) {
UChar32 ch;
int32_t i = 0;
U8_NEXT(codeUnits, i, length, ch);
if (isPrivateUse(ch)) {
*pErrorCode = U_ILLEGAL_CHAR_FOUND;
return;
}
}
UCNV_TO_U_CALLBACK_STOP(context, toUArgs, codeUnits, length, reason, pErrorCode);
}
ucnv_setToUCallBack(conv, stopOnPUACallBack, nullptr, nullptr, nullptr, &status);
I tried to debug the code, and I found that its reason
is UCNV_CLOSE
.