High Priority
NOTE: We are publishing this Red Book page before the 9.8.0 release so you can plan any work that may be needed for the Regex changes.
Beginning with version 9.8, HPCC Systems will use a new third-party library to support regular expression operations in the platform. PCRE2 will replace both Boost::regex and ICU in regular expression pattern parsing, compilation, and substring replacement. This change brings more consistency to how regex works within ECL, better support for UTF-8 data and overall better performance.
...
The ECL functions REGEXFIND, REGEXFINDSET, and REGEXREPLACE. Our regression tests have turned up only four usage patterns that need attention. The good news is that you can make the changes outlined in this document now, in your current (pre-9.8) version of the platform, and be far less surprised when your cluster is eventually upgraded.
It must be stressed that if your code uses one of the usage patterns outlined in this document then you must update your code. Failure to do so will result in compilation errors or failing workunits at runtime.
Questions? Concerns?
Contact Dan S. Camper
dan.camper@lexisnexisrisk.com
...
Example:
REGEXFIND('\\bBAR\\b|\\INTO\\b|\\bBAZ\\b', ' MY PINTO HORSE '); //This fails
...
Example:
REGEXFIND('\\bBAR\\b|\\bINTO\\b|\\bBAZ\\b', ' MY PINTO HORSE '); //Working Example
...
Example:
REGEXFIND(u'\\p{Letter}', u' MY PINTO HORSE '); //This fails
...
Example:
REGEXFIND(u'\\p{L} ', u' MY PINTO HORSE '); //Working example
...
This affects the search pattern argument (the first argument) within REGEXFIND, REGEXFINDSET, and REGEXREPLACE. Only Unicode and UTF-8 searches are affected.
Example:
REGEXFIND(u'[[:Lu:]]', u' MY PINTO HORSE '); //This fails
...
Solution: Use the [\p{N} ] syntax instead of [[:N:]] syntax in your Unicode search arguments.
Example:
REGEXFIND(u'[\\p{Lu}]', u' MY PINTO HORSE '); //Working example
...
REGEXREPLACE('(\\d{3})-\\1-(\\d{4})', '512-512-5555', 'XXX-XXX-\\2'); //This fails
Prior to version 9.8, this code would produce 'XXX-XXX-5555' as a result.
...
REGEXREPLACE('(\\d{3})-\\1-(\\d{4})', '512-512-5555', 'XXX-XXX-$2'); //Working example