Java 21 problem with DateFormat.getDateTimeInstance().format(new Date())

This is my code

import java.util.Date;
import java.text.DateFormat;

class DateTime {
    public static void main(String[] args) {
        String dt = DateFormat.getDateTimeInstance().format(new Date());
        System.out.println(dt);
    }
}

When compiled and executed with Java 21, the call to ‘format()’ returns a UTF-16 string containing invalid bytes, represented by a question mark:

Oct 3, 2023, 7:01:17?PM

Has anyone else seen this problem? Thanks.

New feature, not a bug

The Answer by David Conrad is correct. What you are seeing is a new feature, not a bug.

New version of CLDR

The localization rules defined in the Unicode Consortium’s Common Locale Data Repository (CLDR) are continually evolving. Modern Java relies upon the CLDR as its main source of localization rules. So new versions of the CLDR bring new behaviors in Java.

This is life in the real world. Never harden your expectation of localized values. Those localizations may change in future versions of the CLDR, Java, and human cultures.

If localization behavior is critical to some logic in your code, write unit tests to verify that behavior.

Detecting NNBSP character

We can verify Conrad’s claim that you are indeed seeing a U+202F NARROW NO-BREAK SPACE (NNBSP). Let’s examine each character in your output.

We can inspect each character to get its number assigned by the Unicode Consortium, its code point. Our NNBSP character has a code point of 8,239 decimal, 202F hex.

String dt = DateFormat.getDateTimeInstance ( ).format ( new Date ( ) );
System.out.println ( dt );
String codePoints = dt.codePoints ( ).boxed ( ).toList ( ).toString ( );
System.out.println ( "codePoints = " + codePoints );

When run:

Oct 3, 2023, 6:02:35 PM
codePoints = [79, 99, 116, 32, 51, 44, 32, 50, 48, 50, 51, 44, 32, 54, 58, 48, 50, 58, 51, 53, 8239, 80, 77]

Sure enough, we see the 8239 of our NNBSP is third from the end, before the P and the M.

Change is good

I would like to add a note about this change in the CLDR: This change is a good one, and makes sense. In logical typographical thinking, the AM/PM of a time-of-day should never be separated from the hours-minutes-seconds. Wrapping AM/PM on another line makes for clumsy reading. Using a non-breaking space rather than a plain breaking space makes sense. Being “thin” is a judgement I’ll leave to the typography experts, but I gather makes sense as well.

Solution: Fix your console

The immediate solution to your problem of a ? replacement character appearing is to 👉🏾 change the character-encoding of your console app. Whatever console app you are using (which you neglected to mention in your Question) is apparently configured for some archaic character encoding rather than a modern Unicode-savvy character encoding such as UTF-8.

Change the character encoding of your console app (see Comment). Than your errant ? should appear as the true character, a thin non-breaking space.

Avoid legacy date-time classes

You are using terribly flawed date-time classes that were years ago supplanted by the modern java.time defined in JSR 310. This use of legacy date-time classes should be avoided, instead using java.time for date-time work.

Your choice of legacy classes is not a factor in the particular issue of your Question. But just FYI, let me show you the modern version of your code.

An Instant object represents a moment as seen in UTC, that is, with an offset from UTC of zero hours-minutes-seconds. You can adjust that moment into a time zone, obtaining a ZonedDateTime. Same point on the timeline, but different wall-clock time/calendar.

Instant instant = Instant.now ( ); // `java.util.Date` was years ago replaced by `java.time.Instant`.
ZoneId z = ZoneId.of ( "Asia/Tokyo" );  // Or, `ZoneId.systemDefault`. 
ZonedDateTime zdt = instant.atZone ( z );
Locale locale = Locale.US;  
DateTimeFormatter f = DateTimeFormatter.ofLocalizedDateTime ( FormatStyle.MEDIUM ).withLocale ( locale );
String output = zdt.format ( f );
System.out.println ( "output = " + output );
System.out.println ( output.codePoints ( ).boxed ( ).toList ( ).toString ( ) );

When run.

output = Oct 4, 2023, 10:21:32 AM
[79, 99, 116, 32, 52, 44, 32, 50, 48, 50, 51, 44, 32, 49, 48, 58, 50, 49, 58, 51, 50, 8239, 65, 77]

We see the same 8239 before the A and the M.

We can examine the characters by their official Unicode names.

output.codePoints ( ).mapToObj ( Character :: getName ).forEach ( System.out :: println );

When run:

LATIN CAPITAL LETTER O
LATIN SMALL LETTER C
LATIN SMALL LETTER T
SPACE
DIGIT FIVE
COMMA
SPACE
DIGIT TWO
DIGIT ZERO
DIGIT TWO
DIGIT THREE
COMMA
SPACE
DIGIT ONE
DIGIT ZERO
COLON
DIGIT ZERO
DIGIT TWO
COLON
DIGIT TWO
DIGIT SIX
NARROW NO-BREAK SPACE
LATIN CAPITAL LETTER A
LATIN CAPITAL LETTER M

Notice the NARROW NO-BREAK SPACE, third from last.

And we can examine the characters by their code point in hexadecimal rather than decimal.

output.codePoints ( ).mapToObj ( ( int codePoint ) -> String.format ( "U+%04X" , codePoint ) ).forEach ( System.out :: println );

When run:

U+004F
U+0063
U+0074
U+0020
U+0035
U+002C
U+0020
U+0032
U+0030
U+0032
U+0033
U+002C
U+0020
U+0031
U+0030
U+003A
U+0030
U+0035
U+003A
U+0031
U+0037
U+202F
U+0041
U+004D

Notice the U+202F, third from last.

For Unicode geeks

This topic turns out to be an interesting can of worms for Unicode geeks like me.

Section 1 of the Unicode Consortium document, Proposal to synchronize the Core Specification explains that character U+202F NARROW NO-BREAK SPACE (NNBSP) has been incorrectly described as a narrow version of U+00A0 NO-BREAK SPACE. This means the Width variation section of the Non-breaking space page on Wikipedia is incorrect. That Unicode document argues that NNBSP is actually a non-breaking version of U+2009 THIN SPACE.

Another interesting note in that document is that the NNBSP character has largely served two purposes. I quote (my bullets):

The NNBSP can be used to represent the narrow space occurring around punctuation characters in French typography, which is called an “espace fine insécable.”
It is used especially in Mongolian text, before certain grammatical suffixes, to provide a small gap that not only prevents word breaking and line breaking, but also triggers special shaping for those suffixes.

Apparently we can now add a third major use to this use: formatting in date-time formats defined by the CLDR.

There was a change made in JDK 20 to upgrade to CLDR data version 42 from The Unicode Common Locale Data Repository, which changed to a non-breaking space (nbsp), aka NARROW NO-BREAK SPACE.

Bug 8304925 has been filed but the workarounds listed amount to: get used to it, ask Unicode to revert the change (unlikely), or

Use the legacy locale data by designating -Djava.locale.providers=COMPAT at the launcher command line. (This option limits some newer functionalities though.)

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 15:40

Thẻ: javadatetimeformattingjava-21non-breaking-characters

Thiết kế website giá rẻ

Danh mục

Java 21 problem with DateFormat.getDateTimeInstance().format(new Date())

New feature, not a bug

New version of CLDR

Detecting NNBSP character

Change is good

Solution: Fix your console

Avoid legacy date-time classes

For Unicode geeks