[x3d-public] X3D regular expression (regex) improvements

Andreas Plesch andreasplesch at gmail.com
Mon Aug 20 07:43:42 PDT 2018


Hi Don,

What I was suggesting, was multiple patterns for a single simple type,
more below.

On Mon, Aug 20, 2018 at 1:29 AM Don Brutzman <brutzman at nps.edu> wrote:
>
> On 8/17/2018 8:06 PM, Andreas Plesch wrote:
> > What about expanding the validation to multiple regexes which all need to match to check for illegal values where necessary ?
> >
> > first: !((0|0.0*|.0+)\s+){3})*
> >
> > second: actual SFRotation pattern
>
> Relevant page section is now bookmarked, that possibility is also added as a TODO item.
>
>         X3D Regexes: Negative lookahead and disallowed values
>         http://www.web3d.org/specifications/X3dRegularExpressions.html#NegativeLookahead
>
> Modifying the primary regexes is pretty simple, just insert the given "negative lookahead" block at the beginning.
>
> > Can there be multiple regexes in the XML Schema for a type ?
>
> Multiple special "simple types" are listed in X3D XML Schema and each can have a regex if we want.
>
>         X3D XML Schema x3d-4.0.xsd documentation
>         http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0.html

http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFRotation.html

has a single pattern restriction as a facet.

It is possible to have multiple patterns in the pattern restriction facet:

https://www.w3.org/TR/xmlschema11-2/#rf-pattern

But these will be combined with an OR logic during validation while we
would need an AND logic to account for the 0 0 0 checking.

>From what I can understand it looks like AND combining of multiple
patterns is only possible with multiple steps in the type derivation
which I think would require an intermediate base type. Perhaps a
SFRotationBase type could be added which does one step of the
validation (the SFRotation pattern without negative lookahead) with
SFRotation then doing the final step (the non 0 0 0 pattern).

While possible, it may be too much effort for the gain.

>
> As it so happens, this weekend I cleaned up the expression for bboxSize type.  Also added it to X3DUOM and X3DJSAIL.  The regex can be found as part of the base type SFVec3fObject.
>
>         X3D Regular Expressions (regexes): bboxSize
>         http://www.web3d.org/specifications/X3dRegularExpressions.html#bboxSize
>
> Online unit tests:
>         https://regex101.com/r/sjaPZq/1

Should -1e1 -1 -1, or

-1e0 -1 -1 or

-0.1e2 -1 -1

be allowed as well ? Currently the regex does not match these. It is
unlikely that anybody would want to do that but it may be legal.

-Andreas

>
> Modified X3DUOM to include regexes in simple types, excerpted result:
>
>     <SimpleType name="bboxSizeType"
>         baseType="SFVec3f"
>         defaultValue="-1 -1 -1"
>         regex="\s*((([+]?(((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){2}([+]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*)|(\-1(\.(0)*)?\s+\-1(\.(0)*)?\s+\-1(\.(0)*)?)\s*)?"
>         appinfo="bboxSizeType dimensions are non-negative values, default value (-1 -1 -1) indicates that no bounding box size has been computed."
>         documentation="http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/components/group.html#Boundingboxes"/>
>
> also
>         X3DJSAIL: Utility Methods and Functionality, regexes
>         http://www.web3d.org/specifications/java/X3DJSAIL.html#regex
>
> * Complete.    X3D Regular Expression (regex) support in field types, for example SFVec3fObject REGEX string, pattern, validate(), matches() and matches(String value).
> * Progressing. X3D Regular Expression (regex) support for special types, for example SFVec3fObject.matchesBboxSizeType(String value). TODO: add Matrix types.
>
>         http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#matchesBboxSizeType-java.lang.String-
>
> Here is the current unit-testing code in FieldObjectTests.java
>
>      @Test
>      @DisplayName("Test SFVec3fObject bboxSizeType checks on single-field 3-tuple single-precision floating-point array")
>      void SFVec3fBboxSizeObjectTests()
>      {
>          System.out.println ("SFVec3fBboxSizeObjectTests for bounding box (bbox) constraints...");
>          float[] defaultBboxSizeArray = { -1.0f, -1.0f, -1.0f };
>          assertTrue  (Arrays.equals(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE, new SFVec3fObject().setValueByString(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING).getPrimitiveValue()),
>                                                          "test DEFAULT_VALUE_BBOXSIZETYPE matches DEFAULT_VALUE_BBOXSIZETYPE_STRING for this field object");
>          SFVec3fObject testSFVec3fBboxSizeObject = new SFVec3fObject(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE); // static initializer is tested, might throw exception
>          assertTrue  (testSFVec3fBboxSizeObject.matches(),       "testSFVec3fBboxSizeObject.matches() tests object initialization correctly matches regex");
>          assertTrue  (Arrays.equals(defaultBboxSizeArray, SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE), "test correct default value for this field object");
>          assertTrue  (SFVec3fObject.matches(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING),
>                                                          "SFVec3fObject.matches(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING) tests object initialization correctly matches regex");
>          assertFalse (testSFVec3fBboxSizeObject.isDefaultValue(),"test initialized field object isDefaultValue() returns true");
>          assertTrue  (!SFVec3fObject.REGEX_BBOXSIZETYPE.contains("^") && !SFVec3fObject.REGEX_BBOXSIZETYPE.contains("$"), "test SFVec3fObject.REGEX does not contain anchor characters ^ or $");
>          // avoid unexpected equivalent regexes
>          assertFalse (SFVec3fObject.REGEX.equals(SFVec3fObject.REGEX_BBOXSIZETYPE), "test SFVec3fObject.REGEX.equals(SFVec3fObject.REGEX_BBOXSIZETYPE) returns false");
>
>          testSFVec3fBboxSizeObject.setValue(-1.0f, -1.0f, -1.0f);
>          assertTrue  (Arrays.equals(defaultBboxSizeArray,  testSFVec3fBboxSizeObject.getPrimitiveValue()), "tests setting object value to 0.0f 0.0f 0.0f results in singleton array with same value");
>
>          testSFVec3fBboxSizeObject.setValue(defaultBboxSizeArray); // returns void because it matches (overrides) Java SAI specification interface
>          assertEquals(defaultBboxSizeArray,testSFVec3fBboxSizeObject.getPrimitiveValue(),   "tests setting object value to default-value array results in equivalent getPrimitiveValue()");
>
>          assertFalse  (SFVec3fObject.matches            (""), "tests empty string \"\" fails SFVec3fObject.matches(value), illegal value");
>          assertTrue   (SFVec3fObject.matchesBboxSizeType(""), "tests empty string \"\" passes SFVec3fObject.matchesBboxSizeType(value), legal value");
>
>          assertFalse  (testSFVec3fBboxSizeObject.setValue        ( -2.0f, -2.0f, -2.0f ).matchesBboxSizeType(), "tests setting object value to -2.0f -2.0f -2.0f fails");
>          assertFalse  (testSFVec3fBboxSizeObject.setValueByString("-2.0   -2.0   -2.0" ).matchesBboxSizeType(), "tests setting object value to \"-2.0   -2.0   -2.0\" fails");
>          assertFalse  (testSFVec3fBboxSizeObject.setValue        ( -2.0f, -2.0f, -2.0f ).matchesBboxSizeType(), "tests setting object value to -2.0f -2.0f -2.0f fails");
>          assertTrue   (SFVec3fObject.matches            ("-2.0 -2.0 -2.0"), "tests \"-2.0 -2.0 -2.0\" passes SFVec3fObject.matches(value)");
>          assertFalse  (SFVec3fObject.matchesBboxSizeType("-2.0 -2.0 -2.0"), "tests \"-2.0 -2.0 -2.0\" fails  SFVec3fObject.matchesBboxSizeType(value)");
>          assertTrue   (SFVec3fObject.matches            (" 2.0  2.0  2.0"), "tests \" 2.0  2.0  2.0\" passes SFVec3fObject.matches(value)");
>          assertTrue   (SFVec3fObject.matchesBboxSizeType(" 2.0  2.0  2.0"), "tests \" 2.0  2.0  2.0\" passes SFVec3fObject.matchesBboxSizeType(value)");
>          assertTrue   (SFVec3fObject.matchesBboxSizeType(" 0.0  0.0  0.0"), "tests \" 0.0  0.0  0.0\" passes SFVec3fObject.matchesBboxSizeType(value)");
>          assertFalse  (SFVec3fObject.matchesBboxSizeType(" 0.0  0.0  0.0  0.0"), "tests \" 0.0  0.0  0.0  0.0\" fails SFVec3fObject.matchesBboxSizeType(value), too many values");
>          assertFalse  (SFVec3fObject.matchesBboxSizeType(" 0.0  0.0"),           "tests \" 0.0  0.0\" fails SFVec3fObject.matchesBboxSizeType(value), insufficient values");
>      }
>
> All working.  8)
>
> So yes, we can add regexes for various types, and perhaps even an alternate pattern for a base type if it makes sense.  This shows one way.
>
> Thanks for SFImage considerations, will work on these another day.
>
> > SFImage regex considerations:
> >
> > Are hex values allowed for width etc. ? The spec. only says integer but perhaps decimal is implied.
> >
> > Can hex. values use any case for letters ? The descriptions only use capitals as 0xFF. Is 0Xff also valid ?
> >
> > 0 0 0 is the default value. But should 0 be allowed for components in any other case ?
> >
> > No leading 0s for integer values.
> >
> > But 0x0000FF is OK.
> >
> > No negative integers for width etc.
> >
> > Perhaps allow negative values for color: 255 = -1 for single component, for example.
> >
> >
> >
> > -- AP on the road
> >
> >
> > On Fri, Aug 17, 2018, 3:15 PM Don Brutzman <brutzman at nps.edu <mailto:brutzman at nps.edu>> wrote:
> >
> >     details follow on current regex progress.
> >
> >     On 8/7/2018 8:21 AM, Don Brutzman wrote:
> >      > [...]
> >      > http://www.web3d.org/specifications/X3dRegularExpressions.html#Design
> >      > http://www.web3d.org/specifications/X3dRegularExpressions.png
> >      >
> >      > Continuing:
> >      >
> >      > On 8/6/2018 11:52 AM, Andreas Plesch wrote:
> >      >> Two quick points:
> >      >>
> >      >> The hexadecimal pattern got corrupted. SFImage uses this pattern:
> >      >>
> >      >> 0x([a-f]|[A-F]|\d]){1,8}
> >      >>
> >      >> where \d is the digit character class.
> >      >
> >      > Thanks, will work on that pattern next.
> >
> >     Currently using 0x[0-9a-fA-F]{1,8} for hexadecimal value, from Regular Expressions Cookbook.
> >
> >     Preliminary pattern for SFImage is there, but also need to allow integers as alternatives for hex values.
> >
> >      >> There are two equivalent +- patterns
> >      >> (\+|\-)?
> >      >> [+|-]?
> >      >> Probably only one should be recommended and used subsequently.
> >      >
> >      > Good catch, thank you.  Will scrutinize and normalize.
> >
> >     [+-]? for sign and [Ee] for scientific-notation exponent.
> >
> >      >> I have no good idea how to detect a leading 0 0 0 as non-matching. My
> >      >> not so good idea would be to explicitly allow
> >      >> non-0 followed by x x or
> >      >> x followed by non-0 followed by x or
> >      >> x x followed by non-0
> >      >>
> >      >> where non-0 is something [+-]?0?\.?[0-9]*[1-9]+[0-9]*([e|E][+-]?[0-9]+)*
> >      >> and x is a floating point number
> >      >>
> >      >> -Andreas
> >      >
> >      > Possibly, but am thinking it adds serious complexity that is hard to maintain.  So probably not but will think about it.
> >
> >     It is great when we find those patterns, am trying to keep things maintainable.  The color-coding on web page definitely helps.
> >
> >     Negative lookahead is good for avoiding illegal values, but not allowed in XML Schema (probably to avoid computational denial of service attacks_.  Have listed a pattern nevertheless, other languages/tools might want to use it.
> >
> >     ==================
> >     Disallowed values:
> >
> >     * Negative lookahead filters can disqualify attributes that contain illegal values.
> >     * W3C Recommendation for XML Schema (XSD) unfortunately does not support this construct.
> >     * (?!((0|0.0*|.0+)\s+){3}) prohibits 0 0 0 since zero vector is illegal as initial axis triplet of an SFRotation.
> >     * (?!\s*,\s*,) prohibits multiple adjacent commas in intermediate whitespace, for example 0 0 0, ,0 0 0 is an illegal set of MFColor values.
> >     ==================
> >
> >      > Thanks for the continuing review!  Very helpful.
> >     Steadily better and better... need to check each regex101 unit test to make sure properly updated.  SF/MFImage are next, then matrix types.
>
>
>
> all the best, Don
> --
> Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
>


-- 
Andreas Plesch
Waltham, MA 02453



More information about the x3d-public mailing list