[x3d-public] X3D regular expression (regex) improvements
Andreas Plesch
andreasplesch at gmail.com
Mon Aug 20 07:43:42 PDT 2018
Hi Don,
What I was suggesting, was multiple patterns for a single simple type,
more below.
On Mon, Aug 20, 2018 at 1:29 AM Don Brutzman <brutzman at nps.edu> wrote:
>
> On 8/17/2018 8:06 PM, Andreas Plesch wrote:
> > What about expanding the validation to multiple regexes which all need to match to check for illegal values where necessary ?
> >
> > first: !((0|0.0*|.0+)\s+){3})*
> >
> > second: actual SFRotation pattern
>
> Relevant page section is now bookmarked, that possibility is also added as a TODO item.
>
> X3D Regexes: Negative lookahead and disallowed values
> http://www.web3d.org/specifications/X3dRegularExpressions.html#NegativeLookahead
>
> Modifying the primary regexes is pretty simple, just insert the given "negative lookahead" block at the beginning.
>
> > Can there be multiple regexes in the XML Schema for a type ?
>
> Multiple special "simple types" are listed in X3D XML Schema and each can have a regex if we want.
>
> X3D XML Schema x3d-4.0.xsd documentation
> http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0.html
http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFRotation.html
has a single pattern restriction as a facet.
It is possible to have multiple patterns in the pattern restriction facet:
https://www.w3.org/TR/xmlschema11-2/#rf-pattern
But these will be combined with an OR logic during validation while we
would need an AND logic to account for the 0 0 0 checking.
>From what I can understand it looks like AND combining of multiple
patterns is only possible with multiple steps in the type derivation
which I think would require an intermediate base type. Perhaps a
SFRotationBase type could be added which does one step of the
validation (the SFRotation pattern without negative lookahead) with
SFRotation then doing the final step (the non 0 0 0 pattern).
While possible, it may be too much effort for the gain.
>
> As it so happens, this weekend I cleaned up the expression for bboxSize type. Also added it to X3DUOM and X3DJSAIL. The regex can be found as part of the base type SFVec3fObject.
>
> X3D Regular Expressions (regexes): bboxSize
> http://www.web3d.org/specifications/X3dRegularExpressions.html#bboxSize
>
> Online unit tests:
> https://regex101.com/r/sjaPZq/1
Should -1e1 -1 -1, or
-1e0 -1 -1 or
-0.1e2 -1 -1
be allowed as well ? Currently the regex does not match these. It is
unlikely that anybody would want to do that but it may be legal.
-Andreas
>
> Modified X3DUOM to include regexes in simple types, excerpted result:
>
> <SimpleType name="bboxSizeType"
> baseType="SFVec3f"
> defaultValue="-1 -1 -1"
> regex="\s*((([+]?(((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s+){2}([+]?((0|[1-9][0-9]*)(\.[0-9]*)?|\.[0-9]+)([Ee][+-]?[0-9]+)?)\s*)|(\-1(\.(0)*)?\s+\-1(\.(0)*)?\s+\-1(\.(0)*)?)\s*)?"
> appinfo="bboxSizeType dimensions are non-negative values, default value (-1 -1 -1) indicates that no bounding box size has been computed."
> documentation="http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/components/group.html#Boundingboxes"/>
>
> also
> X3DJSAIL: Utility Methods and Functionality, regexes
> http://www.web3d.org/specifications/java/X3DJSAIL.html#regex
>
> * Complete. X3D Regular Expression (regex) support in field types, for example SFVec3fObject REGEX string, pattern, validate(), matches() and matches(String value).
> * Progressing. X3D Regular Expression (regex) support for special types, for example SFVec3fObject.matchesBboxSizeType(String value). TODO: add Matrix types.
>
> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#matchesBboxSizeType-java.lang.String-
>
> Here is the current unit-testing code in FieldObjectTests.java
>
> @Test
> @DisplayName("Test SFVec3fObject bboxSizeType checks on single-field 3-tuple single-precision floating-point array")
> void SFVec3fBboxSizeObjectTests()
> {
> System.out.println ("SFVec3fBboxSizeObjectTests for bounding box (bbox) constraints...");
> float[] defaultBboxSizeArray = { -1.0f, -1.0f, -1.0f };
> assertTrue (Arrays.equals(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE, new SFVec3fObject().setValueByString(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING).getPrimitiveValue()),
> "test DEFAULT_VALUE_BBOXSIZETYPE matches DEFAULT_VALUE_BBOXSIZETYPE_STRING for this field object");
> SFVec3fObject testSFVec3fBboxSizeObject = new SFVec3fObject(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE); // static initializer is tested, might throw exception
> assertTrue (testSFVec3fBboxSizeObject.matches(), "testSFVec3fBboxSizeObject.matches() tests object initialization correctly matches regex");
> assertTrue (Arrays.equals(defaultBboxSizeArray, SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE), "test correct default value for this field object");
> assertTrue (SFVec3fObject.matches(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING),
> "SFVec3fObject.matches(SFVec3fObject.DEFAULT_VALUE_BBOXSIZETYPE_STRING) tests object initialization correctly matches regex");
> assertFalse (testSFVec3fBboxSizeObject.isDefaultValue(),"test initialized field object isDefaultValue() returns true");
> assertTrue (!SFVec3fObject.REGEX_BBOXSIZETYPE.contains("^") && !SFVec3fObject.REGEX_BBOXSIZETYPE.contains("$"), "test SFVec3fObject.REGEX does not contain anchor characters ^ or $");
> // avoid unexpected equivalent regexes
> assertFalse (SFVec3fObject.REGEX.equals(SFVec3fObject.REGEX_BBOXSIZETYPE), "test SFVec3fObject.REGEX.equals(SFVec3fObject.REGEX_BBOXSIZETYPE) returns false");
>
> testSFVec3fBboxSizeObject.setValue(-1.0f, -1.0f, -1.0f);
> assertTrue (Arrays.equals(defaultBboxSizeArray, testSFVec3fBboxSizeObject.getPrimitiveValue()), "tests setting object value to 0.0f 0.0f 0.0f results in singleton array with same value");
>
> testSFVec3fBboxSizeObject.setValue(defaultBboxSizeArray); // returns void because it matches (overrides) Java SAI specification interface
> assertEquals(defaultBboxSizeArray,testSFVec3fBboxSizeObject.getPrimitiveValue(), "tests setting object value to default-value array results in equivalent getPrimitiveValue()");
>
> assertFalse (SFVec3fObject.matches (""), "tests empty string \"\" fails SFVec3fObject.matches(value), illegal value");
> assertTrue (SFVec3fObject.matchesBboxSizeType(""), "tests empty string \"\" passes SFVec3fObject.matchesBboxSizeType(value), legal value");
>
> assertFalse (testSFVec3fBboxSizeObject.setValue ( -2.0f, -2.0f, -2.0f ).matchesBboxSizeType(), "tests setting object value to -2.0f -2.0f -2.0f fails");
> assertFalse (testSFVec3fBboxSizeObject.setValueByString("-2.0 -2.0 -2.0" ).matchesBboxSizeType(), "tests setting object value to \"-2.0 -2.0 -2.0\" fails");
> assertFalse (testSFVec3fBboxSizeObject.setValue ( -2.0f, -2.0f, -2.0f ).matchesBboxSizeType(), "tests setting object value to -2.0f -2.0f -2.0f fails");
> assertTrue (SFVec3fObject.matches ("-2.0 -2.0 -2.0"), "tests \"-2.0 -2.0 -2.0\" passes SFVec3fObject.matches(value)");
> assertFalse (SFVec3fObject.matchesBboxSizeType("-2.0 -2.0 -2.0"), "tests \"-2.0 -2.0 -2.0\" fails SFVec3fObject.matchesBboxSizeType(value)");
> assertTrue (SFVec3fObject.matches (" 2.0 2.0 2.0"), "tests \" 2.0 2.0 2.0\" passes SFVec3fObject.matches(value)");
> assertTrue (SFVec3fObject.matchesBboxSizeType(" 2.0 2.0 2.0"), "tests \" 2.0 2.0 2.0\" passes SFVec3fObject.matchesBboxSizeType(value)");
> assertTrue (SFVec3fObject.matchesBboxSizeType(" 0.0 0.0 0.0"), "tests \" 0.0 0.0 0.0\" passes SFVec3fObject.matchesBboxSizeType(value)");
> assertFalse (SFVec3fObject.matchesBboxSizeType(" 0.0 0.0 0.0 0.0"), "tests \" 0.0 0.0 0.0 0.0\" fails SFVec3fObject.matchesBboxSizeType(value), too many values");
> assertFalse (SFVec3fObject.matchesBboxSizeType(" 0.0 0.0"), "tests \" 0.0 0.0\" fails SFVec3fObject.matchesBboxSizeType(value), insufficient values");
> }
>
> All working. 8)
>
> So yes, we can add regexes for various types, and perhaps even an alternate pattern for a base type if it makes sense. This shows one way.
>
> Thanks for SFImage considerations, will work on these another day.
>
> > SFImage regex considerations:
> >
> > Are hex values allowed for width etc. ? The spec. only says integer but perhaps decimal is implied.
> >
> > Can hex. values use any case for letters ? The descriptions only use capitals as 0xFF. Is 0Xff also valid ?
> >
> > 0 0 0 is the default value. But should 0 be allowed for components in any other case ?
> >
> > No leading 0s for integer values.
> >
> > But 0x0000FF is OK.
> >
> > No negative integers for width etc.
> >
> > Perhaps allow negative values for color: 255 = -1 for single component, for example.
> >
> >
> >
> > -- AP on the road
> >
> >
> > On Fri, Aug 17, 2018, 3:15 PM Don Brutzman <brutzman at nps.edu <mailto:brutzman at nps.edu>> wrote:
> >
> > details follow on current regex progress.
> >
> > On 8/7/2018 8:21 AM, Don Brutzman wrote:
> > > [...]
> > > http://www.web3d.org/specifications/X3dRegularExpressions.html#Design
> > > http://www.web3d.org/specifications/X3dRegularExpressions.png
> > >
> > > Continuing:
> > >
> > > On 8/6/2018 11:52 AM, Andreas Plesch wrote:
> > >> Two quick points:
> > >>
> > >> The hexadecimal pattern got corrupted. SFImage uses this pattern:
> > >>
> > >> 0x([a-f]|[A-F]|\d]){1,8}
> > >>
> > >> where \d is the digit character class.
> > >
> > > Thanks, will work on that pattern next.
> >
> > Currently using 0x[0-9a-fA-F]{1,8} for hexadecimal value, from Regular Expressions Cookbook.
> >
> > Preliminary pattern for SFImage is there, but also need to allow integers as alternatives for hex values.
> >
> > >> There are two equivalent +- patterns
> > >> (\+|\-)?
> > >> [+|-]?
> > >> Probably only one should be recommended and used subsequently.
> > >
> > > Good catch, thank you. Will scrutinize and normalize.
> >
> > [+-]? for sign and [Ee] for scientific-notation exponent.
> >
> > >> I have no good idea how to detect a leading 0 0 0 as non-matching. My
> > >> not so good idea would be to explicitly allow
> > >> non-0 followed by x x or
> > >> x followed by non-0 followed by x or
> > >> x x followed by non-0
> > >>
> > >> where non-0 is something [+-]?0?\.?[0-9]*[1-9]+[0-9]*([e|E][+-]?[0-9]+)*
> > >> and x is a floating point number
> > >>
> > >> -Andreas
> > >
> > > Possibly, but am thinking it adds serious complexity that is hard to maintain. So probably not but will think about it.
> >
> > It is great when we find those patterns, am trying to keep things maintainable. The color-coding on web page definitely helps.
> >
> > Negative lookahead is good for avoiding illegal values, but not allowed in XML Schema (probably to avoid computational denial of service attacks_. Have listed a pattern nevertheless, other languages/tools might want to use it.
> >
> > ==================
> > Disallowed values:
> >
> > * Negative lookahead filters can disqualify attributes that contain illegal values.
> > * W3C Recommendation for XML Schema (XSD) unfortunately does not support this construct.
> > * (?!((0|0.0*|.0+)\s+){3}) prohibits 0 0 0 since zero vector is illegal as initial axis triplet of an SFRotation.
> > * (?!\s*,\s*,) prohibits multiple adjacent commas in intermediate whitespace, for example 0 0 0, ,0 0 0 is an illegal set of MFColor values.
> > ==================
> >
> > > Thanks for the continuing review! Very helpful.
> > Steadily better and better... need to check each regex101 unit test to make sure properly updated. SF/MFImage are next, then matrix types.
>
>
>
> all the best, Don
> --
> Don Brutzman Naval Postgraduate School, Code USW/Br brutzman at nps.edu
> Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
>
--
Andreas Plesch
Waltham, MA 02453
More information about the x3d-public
mailing list