I want to use icu to validate whether a stream of bytes is encoded in GB2312
:
bool is_gb2312(char *data, size_t len) {
UErrorCode status = U_ZERO_ERROR;
UConverter* conv = ucnv_open("GB2312", &status);
if (U_FAILURE(status)) {
std::cerr << "ucnv_open failed" << std::endl;
return false;
}
auto* output = new UChar[len * 2];
UChar* target = output;
char const* source = data;
ucnv_toUnicode(conv, &target, output + len * 2, &source, data + len, nullptr, true, &status);
if (U_FAILURE(status)) {
std::cerr << "ucnv_toUnicode failed" << std::endl;
return false;
}
delete[] output;
ucnv_close(conv);
return true;
}
I found that it never reports an error even if there are invalid bytes (e.g., 0xA1 0xA0
), and it just converts it into 65533 (replacement character). How to make it stop early and report error as soon as it meets 65533?