When using POSIX classes like \p{Alpha} without the UNICODE_CHARACTER_CLASS flag or when using hard-coded character classes like "[a-zA-Z]", letters outside of the ASCII range, such as umlauts, accented letters or letter from non-Latin languages, won't be matched. This may cause code to incorrectly handle input containing such letters.

To correctly handle non-ASCII input, it is recommended to use Unicode classes like \p{IsAlphabetic}. When using POSIX classes, Unicode support should be enabled by either passing Pattern.UNICODE_CHARACTER_CLASS as a flag to Pattern.compile or by using (?U) inside the regex.

Noncompliant Code Example

Pattern.compile("[a-zA-Z]");
Pattern.compile("\\p{Alpha}");

Compliant Solution

Pattern.compile("\\p{IsAlphabetic}"); // matches all letters from all languages
Pattern.compile("\\p{IsLatin}"); // matches latin letters, including umlauts and other non-ASCII variations
Pattern.compile("\\p{Alpha}", Pattern.UNICODE_CHARACTER_CLASS);
Pattern.compile("(?U)\\p{Alpha}");