[x3d-public] X3D regular expression (regex) improvements
brutzman at nps.edu
Sun Jul 29 23:55:30 PDT 2018
Am happy to report that improved regex updates are available for SF/MFBool, SF/MFInt32, SF/MFFloat, SF/MFDouble, SF/MFTime.
X3D Regular Expressions (regexes): X3D Pattern
In order to enable inclusion of commas as whitespace characters, native XML Schema datatypes typically cannot be used directly. In order to ensure strict validation, regex patterns must be used. Of further note is that regex patterns only apply to base type xs:string.
General regex design considerations for X3D XML Schema include the following.
For numeric types, leading sign characters (+ or -) are optionally present.
For numeric types, leading zeroes are not allowed, except for an optional leading zero preceding the decimal point when the significand is only fractional.
Regex \s accepts a number of characters as whitespace, so ( \t\n\r)* is used to strictly honor whitespace characters defined in ClassicVRML grammar.
Intermediate commas are treated as whitespace, but only allowed between each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f array do not contain comma characters (but may be separated by commas and whitespace). Experience has shown that misplaced commas are a crucial indicator of malformed tuple values in large float arrays.
A required mantissa (integer or floating point) is represented as 0|[1-9][0-9]* (meaning either a single 0 or else no leading zeroes).
The fractional part (to the right of the decimal point) can be represented as [0-9]*
Scientific notation starts with upper or lower-case E, is optionally positive or negative: ((E|e)(\+|\-)?[0-9]+)? and can be added to integer or float values.
TODO. Originally these regexes assumed that leading/trailing whitespace has been removed. Now prepending/appending regex constructs such as (\s)* to consume outer whitespace.
TODO. allow multiple inner whitespace characters, optionally including comma between individual MF array values.
Regex anchors ^ (line start) and $ (line end) are implicit and not included in XML Schema and X3DUOM regexes. Note that strict consumption of all value characters gets performed by these regexes. The anchor characters are necessary for regex101 engine unit tests, otherwise illegal values (for MF types) are not rejected (for MF types).
These regex improvements are published as part of X3Dv4 XML Schema for evaluation.
Regex testing for SF/MFBool, SF/MFInt32, SF/MFFloat, SF/MFDouble, SF/MFTime passes using XMLSpy, regex101, X3DJSAIL/X3DUOM, XML Schema validation.
See links (especially regex101 unit tests!) via
Even more detailed unit tests are now bundled as part of X3DJSAIL testing in
Thanks for all of the dialog to date, it really helped in this tricky business (especially for anchors). Will work on a few Color and SFVec/MFVec types next.
Feedback always welcome. Have fun validating correct content with X3D Regex!
all the best, Don
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman at nps.edu
Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
More information about the x3d-public