[x3d-public] X3D regular expression (regex) improvements

Don Brutzman brutzman at nps.edu
Sun Aug 19 22:26:26 PDT 2018


On 8/17/2018 8:06 PM, Andreas Plesch wrote:
> What about expanding the validation to multiple regexes which all need to match to check for illegal values where necessary ?
> 
> first: !((0|0.0*|.0+)\s+){3})*
> 
> second: actual SFRotation pattern

Relevant page section is now bookmarked, that possibility is also added as a TODO item.

	X3D Regexes: Negative lookahead and disallowed values
	http://www.web3d.org/specifications/X3dRegularExpressions.html#NegativeLookahead

Modifying the primary regexes is pretty simple, just insert the given "negative lookahead" block at the beginning.

> Can there be multiple regexes in the XML Schema for a type ?

Multiple special "simple types" are listed in X3D XML Schema and each can have a regex if we want.

	X3D XML Schema x3d-4.0.xsd documentation	
	http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0.html

As it so happens, this weekend I cleaned up the expression for bboxSize type.  Also added it to X3DUOM and X3DJSAIL.  The regex can be found as part of the base type SFVec3fObject.

	X3D Regular Expressions (regexes): bboxSize
	http://www.web3d.org/specifications/X3dRegularExpressions.html#bboxSize

Online unit tests:
	https://regex101.com/r/sjaPZq/1

Modified X3DUOM to include regexes in simple types, excerpted result:

    <SimpleType name="bboxSizeType"
	baseType="SFVec3f"
	defaultValue="-1 -1 -1"
	regex="\s*((([+]?(((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){2}([+]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*)|(\-1(\.(0)*)?\s+\-1(\.(0)*)?\s+\-1(\.(0)*)?)\s*)?"
	appinfo="bboxSizeType dimensions are non-negative values, default value (-1 -1 -1) indicates that no bounding box size has been computed."
	documentation="http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/components/group.html#Boundingboxes"/>

also
	X3DJSAIL: Utility Methods and Functionality, regexes
	http://www.web3d.org/specifications/java/X3DJSAIL.html#regex

* Complete.    X3D Regular Expression (regex) support in field types, for example SFVec3fObject REGEX string, pattern, validate(), matches() and matches(String value).
* Progressing. X3D Regular Expression (regex) support for special types, for example SFVec3fObject.matchesBboxSizeType(String value). TODO: add Matrix types.

	http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#matchesBboxSizeType-java.lang.String-

Here is the current unit-testing code in FieldObjectTests.java

     @Test
     @DisplayName("Test SFVec3fObject bboxSizeType checks on single-field 3-tuple single-precision floating-point array")
     void SFVec3fBboxSizeObjectTests()
     {
         System.out.println ("SFVec3fBboxSizeObjectTests for bounding box (bbox) constraints...");
         float[] defaultBboxSizeArray = { -1.0f, -1.0f, -1.0f };
         assertTrue  (Arrays.equals(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE, new SFVec3fObject().setValueByString(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING).getPrimitiveValue()),
                                                         "test DEFAULT_VALUE_BBOXSIZETYPE matches DEFAULT_VALUE_BBOXSIZETYPE_STRING for this field object");
         SFVec3fObject testSFVec3fBboxSizeObject = new SFVec3fObject(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE); // static initializer is tested, might throw exception
         assertTrue  (testSFVec3fBboxSizeObject.matches(),       "testSFVec3fBboxSizeObject.matches() tests object initialization correctly matches regex");
         assertTrue  (Arrays.equals(defaultBboxSizeArray, SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE), "test correct default value for this field object");
         assertTrue  (SFVec3fObject.matches(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING),
                                                         "SFVec3fObject.matches(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING) tests object initialization correctly matches regex");
         assertFalse (testSFVec3fBboxSizeObject.isDefaultValue(),"test initialized field object isDefaultValue() returns true");
         assertTrue  (!SFVec3fObject.REGEX_BBOXSIZETYPE.contains("^") && !SFVec3fObject.REGEX_BBOXSIZETYPE.contains("$"), "test SFVec3fObject.REGEX does not contain anchor characters ^ or $");
         // avoid unexpected equivalent regexes
         assertFalse (SFVec3fObject.REGEX.equals(SFVec3fObject.REGEX_BBOXSIZETYPE), "test SFVec3fObject.REGEX.equals(SFVec3fObject.REGEX_BBOXSIZETYPE) returns false");

         testSFVec3fBboxSizeObject.setValue(-1.0f, -1.0f, -1.0f);
         assertTrue  (Arrays.equals(defaultBboxSizeArray,  testSFVec3fBboxSizeObject.getPrimitiveValue()), "tests setting object value to 0.0f 0.0f 0.0f results in singleton array with same value");
         
         testSFVec3fBboxSizeObject.setValue(defaultBboxSizeArray); // returns void because it matches (overrides) Java SAI specification interface
         assertEquals(defaultBboxSizeArray,testSFVec3fBboxSizeObject.getPrimitiveValue(),   "tests setting object value to default-value array results in equivalent getPrimitiveValue()");

         assertFalse  (SFVec3fObject.matches            (""), "tests empty string \"\" fails SFVec3fObject.matches(value), illegal value");
         assertTrue   (SFVec3fObject.matchesBboxSizeType(""), "tests empty string \"\" passes SFVec3fObject.matchesBboxSizeType(value), legal value");
         
         assertFalse  (testSFVec3fBboxSizeObject.setValue        ( -2.0f, -2.0f, -2.0f ).matchesBboxSizeType(), "tests setting object value to -2.0f -2.0f -2.0f fails");
         assertFalse  (testSFVec3fBboxSizeObject.setValueByString("-2.0   -2.0   -2.0" ).matchesBboxSizeType(), "tests setting object value to \"-2.0   -2.0   -2.0\" fails");
         assertFalse  (testSFVec3fBboxSizeObject.setValue        ( -2.0f, -2.0f, -2.0f ).matchesBboxSizeType(), "tests setting object value to -2.0f -2.0f -2.0f fails");
         assertTrue   (SFVec3fObject.matches            ("-2.0 -2.0 -2.0"), "tests \"-2.0 -2.0 -2.0\" passes SFVec3fObject.matches(value)");
         assertFalse  (SFVec3fObject.matchesBboxSizeType("-2.0 -2.0 -2.0"), "tests \"-2.0 -2.0 -2.0\" fails  SFVec3fObject.matchesBboxSizeType(value)");
         assertTrue   (SFVec3fObject.matches            (" 2.0  2.0  2.0"), "tests \" 2.0  2.0  2.0\" passes SFVec3fObject.matches(value)");
         assertTrue   (SFVec3fObject.matchesBboxSizeType(" 2.0  2.0  2.0"), "tests \" 2.0  2.0  2.0\" passes SFVec3fObject.matchesBboxSizeType(value)");
         assertTrue   (SFVec3fObject.matchesBboxSizeType(" 0.0  0.0  0.0"), "tests \" 0.0  0.0  0.0\" passes SFVec3fObject.matchesBboxSizeType(value)");
         assertFalse  (SFVec3fObject.matchesBboxSizeType(" 0.0  0.0  0.0  0.0"), "tests \" 0.0  0.0  0.0  0.0\" fails SFVec3fObject.matchesBboxSizeType(value), too many values");
         assertFalse  (SFVec3fObject.matchesBboxSizeType(" 0.0  0.0"),           "tests \" 0.0  0.0\" fails SFVec3fObject.matchesBboxSizeType(value), insufficient values");
     }

All working.  8)

So yes, we can add regexes for various types, and perhaps even an alternate pattern for a base type if it makes sense.  This shows one way.

Thanks for SFImage considerations, will work on these another day.

> SFImage regex considerations:
> 
> Are hex values allowed for width etc. ? The spec. only says integer but perhaps decimal is implied.
> 
> Can hex. values use any case for letters ? The descriptions only use capitals as 0xFF. Is 0Xff also valid ?
> 
> 0 0 0 is the default value. But should 0 be allowed for components in any other case ?
> 
> No leading 0s for integer values.
> 
> But 0x0000FF is OK.
> 
> No negative integers for width etc.
> 
> Perhaps allow negative values for color: 255 = -1 for single component, for example.
> 
> 
> 
> -- AP on the road
> 
> 
> On Fri, Aug 17, 2018, 3:15 PM Don Brutzman <brutzman at nps.edu <mailto:brutzman at nps.edu>> wrote:
> 
>     details follow on current regex progress.
> 
>     On 8/7/2018 8:21 AM, Don Brutzman wrote:
>      > [...]
>      > http://www.web3d.org/specifications/X3dRegularExpressions.html#Design
>      > http://www.web3d.org/specifications/X3dRegularExpressions.png
>      >
>      > Continuing:
>      >
>      > On 8/6/2018 11:52 AM, Andreas Plesch wrote:
>      >> Two quick points:
>      >>
>      >> The hexadecimal pattern got corrupted. SFImage uses this pattern:
>      >>
>      >> 0x([a-f]|[A-F]|\d]){1,8}
>      >>
>      >> where \d is the digit character class.
>      >
>      > Thanks, will work on that pattern next.
> 
>     Currently using 0x[0-9a-fA-F]{1,8} for hexadecimal value, from Regular Expressions Cookbook.
> 
>     Preliminary pattern for SFImage is there, but also need to allow integers as alternatives for hex values.
> 
>      >> There are two equivalent +- patterns
>      >> (\+|\-)?
>      >> [+|-]?
>      >> Probably only one should be recommended and used subsequently.
>      >
>      > Good catch, thank you.  Will scrutinize and normalize.
> 
>     [+-]? for sign and [Ee] for scientific-notation exponent.
> 
>      >> I have no good idea how to detect a leading 0 0 0 as non-matching. My
>      >> not so good idea would be to explicitly allow
>      >> non-0 followed by x x or
>      >> x followed by non-0 followed by x or
>      >> x x followed by non-0
>      >>
>      >> where non-0 is something [+-]?0?\.?[0-9]*[1-9]+[0-9]*([e|E][+-]?[0-9]+)*
>      >> and x is a floating point number
>      >>
>      >> -Andreas
>      >
>      > Possibly, but am thinking it adds serious complexity that is hard to maintain.  So probably not but will think about it.
> 
>     It is great when we find those patterns, am trying to keep things maintainable.  The color-coding on web page definitely helps.
> 
>     Negative lookahead is good for avoiding illegal values, but not allowed in XML Schema (probably to avoid computational denial of service attacks_.  Have listed a pattern nevertheless, other languages/tools might want to use it.
> 
>     ==================
>     Disallowed values:
> 
>     * Negative lookahead filters can disqualify attributes that contain illegal values.
>     * W3C Recommendation for XML Schema (XSD) unfortunately does not support this construct.
>     * (?!((0|0.0*|.0+)\s+){3}) prohibits 0 0 0 since zero vector is illegal as initial axis triplet of an SFRotation.
>     * (?!\s*,\s*,) prohibits multiple adjacent commas in intermediate whitespace, for example 0 0 0, ,0 0 0 is an illegal set of MFColor values.
>     ==================
> 
>      > Thanks for the continuing review!  Very helpful.
>     Steadily better and better... need to check each regex101 unit test to make sure properly updated.  SF/MFImage are next, then matrix types.



all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman




More information about the x3d-public mailing list