[x3d-public] SFImage regex

Andreas Plesch andreasplesch at gmail.com
Sat Jun 9 08:37:53 PDT 2018


On Fri, Jun 8, 2018 at 6:13 PM, Don Brutzman <brutzman at nps.edu> wrote:
> onward...
>
..
>>     however the default value of '0 0 0' fails according to XML Spy, so
>> there is still a problem here...
>>
>> I could confirm that this is due to XML regex expressions not considering
>> anchors, in particular the $ end of string anchor.
>
> OK... haven't used them elsewhere in the regexes.  Further info:
>
>         Start of String and End of String Anchors
>         https://www.regular-expressions.info/anchors.html
>
> parenthetically OBTW also found this reference, constructing a
> nondeterministic finite automaton (NFA).  Whoa.
>
>         Ken Thompson's construction algorithm
>         https://en.wikipedia.org/wiki/Thompson%27s_construction_algorithm
>

This is presumably what regex implementations do.

>> Without anchors the regex becomes longer and a bit more complicated, still
>> not allowing leading zeroes or commas:
>>
>>
>> ^\s*(\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*\s*$
>> (allowing for leading and trailing white space, and anchoring)
>>
>>
>> (\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*
>> (untested XML version which is implicitly anchored, assuming the whitespace
>> collapsing restriction)
>
>
> Am using this second form as our next version, consistent with other
> approaches provided.
>
> for MFImage, adapted it with a prepended ( and then appended
> (\s)*(,)?(\s)*)* in order to consume intermediate whitespace plus single
> optional comma.

Yes, something like that:

SFIMAGE(,|\s+SFIMAGE)*

where SFIMAGE is the XML regex above. That is how I would start.

SFIMAGE followed zero or more times by of all of single comma or at
least one whitespace followed by another SFIMAGE.

There may be a shorter pattern.

Hm, actually this does not quite work for SFImages separated by
whitespace such as:

1 1 1 0xff 1 1 1 0xaa

since this pattern would be matched as single image MFImage which means that:

1 1 1 0xff 1 0x1 1 0xaa

would also be matched but should not since a hex value is used for
height in the second image.

So this would only work for comma separated MFImages:

1 1 1 0xff , 1 0x1 1 0xaa

would now correctly not match.

Perhaps that is good enough since sane MFImage values would probably
use commas anyways.


Not sure which fields actually use MFImage.

>> https://regex101.com/r/CJ1h80/4/tests
>
> wow that saved regex pattern + rationale is utterly awesome.  screenshot
> attached and online at

The ability to save unit test strings along with the pattern is really
nice. The only caveat is that the site may not be around forever,
However, the full site source is on Github, I believe,

> http://www.web3d.org/specifications/images/Regex101SFImageTestingScreenshot.png
> Your customized regex page also answers how we might connect regex online
> testing directly to each type on the regex page... link now added at
>
> http://www.web3d.org/specifications/X3dRegularExpressions.html#SFImage
>
>> lets everybody test and improve this regex with some of the more
>> challenging cases as unit tests, including the default 0 0 0 case.
>
> all in!  as you will see, the site allowed me to add a test corresponding to
> the PixelTexture.x3d and PixelTextureBW.x3d models from X3D for Web Authors
> (X3D4WA) archive.

Yes, nice. If you click update regex after adding tests or changing
something, everything is saved in a new version:
https://regex101.com/r/CJ1h80/5/tests

>> It is more important to come up with examples which almost match but
>> actually should not match than testing conforming examples such as the web3d
>> example library. Of course there should not be any false positives, eg. not
>> matching examples which are in fact fine.
>
> well OK but not a worry.  relative priorities are all good when we "divide +
> conquer" like this.

Yes, all good. Just saying that it is relatively straightforward to
find a regex which successfully matches all examples (.* in the
extreme) but hard which does not match almost matching examples (and
still matches all good examples).

> testing all archive scenes helps reveal unintended side effects from changes
> as well.

Yes, we do not want any false alarms.

> ... also OBE. building on last night's progress, I have added
>> =========================================================================================
> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#validate--
>
> validate() methods to each of the simple types that provide an error message
> when a value fails.  excerpt:
>
> public SFVec3fObject setValueByString (String newValue) throws
> InvalidFieldValueException
> {
>         if (newValue == null)
>                 newValue = new String(); // Principle of Least Astonishment
> (POLA)
>                 //
> https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>
>         if (!SFVec3fObject.matches(newValue)) // regex test
>         {
>                 String errorNotice = "*** Regular expression (regex)
> failure, new SFVec3fObject(" + newValue + ")";
>                 validationResult.append(errorNotice).append("\n");
>         }
> =========================================================================================
> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/Grouping/TransformObject.html#validate--
>
> validate() methods for each field to the comprehensive validate() checks in
> each node object.
>
> excerpt from TransformObject validate() method:
>
>         setScaleOrientation(getScaleOrientation()); // exercise field
> checks, simple types
>         validationResult.append((new
> SFRotationObject(getScaleOrientation())).validate());  // regex check of
> corresponding String value
>
>         setTranslation(getTranslation()); // exercise field checks, simple
> types
>                 validationResult.append((new
> SFVec3fObject(getTranslation())).validate());  // regex check of
> corresponding String value
> [...]
> =========================================================================================
>
> just ran some initial tests, it revealed a copy/paste error on my part that
> was fixed.  update uploads in progress.
>
> as before, am only experimenting on the x3d-4.0 schema and X3DUOM object
> model with these regex tests.  will propagate across prior versions once
> stable.
>
> full regression tests on example archives are now poised to run, will follow
> up at a later time.  of course those kinds of tests can also find errors in
> X3D scene content as well as the regexes themselves.
>
>> The new regex had to split the matching of the 3 initial integers into two
>> parts. First matching the first integer, then matching at least one
>> whitespace followed by an integer, twice.
>> For optional value matching, the order is now at least one whitespace
>> followed by a number rather than number then whitespace, to avoid requiring
>> whitespace at the very end. This in turn required an ordering of the or
>> number branches to hex, then multidigits, then single digits for reasons I
>> hesitate to fully explore but probably having to do with greediness.
>> The new regex also uses the \d and \s classes for conciseness, as well as
>> Leonards idea to limit hex values to at most 8 characters.
>
> interesting certainly.  i got as far as i could today.

This was just a rather too brief attempt to explaining the changes in
the new regex

> improved guidance:
> ==================
> http://www.web3d.org/specifications/X3dRegularExpressions.html#X3dPatterns

Would we allow a plus sign (+) for SFImage pixel values ? No reason
anybody would want that but it may not be wrong.
The SFInt32 pattern does not allow for hex values.
I think it would be helpful to note that the ^ and $ metacharacters
for start and end of string matching are not used due to XML regex
compatibility.

> General regex design considerations for X3D XML Schema include the
> following.
> * For numeric types, leading sign characters (+ or -) are optionally
> present.
> *  For numeric types, leading zeroes are not allowed, except for an optional
> leading zero preceding the decimal point when the significand is only
> fractional.
> *  Intermediate commas are treated as whitespace, but only allowed between
> each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f
> array do not contain comma characters.
> *  Careful design allows use of regexes that can also be adapted to
> JavaScript/JSON, Java and other language environments.
> *  These regexes all assume that leading/trailing whitespace has been
> removed. It is possible to prepend/append regex constructs such as (\s)+ to
> consume outer whitespace.
> ==================
>
>> TODO: clarify commas, spec. says commas in SF fields are whitespace, so
>> should be allowed. But allowing in SF fields and then not allowing them in
>> the corresponding MF field for other than set separation is quite confusing.
>
> I have been watching for years and have yet to see (or imagine) any scene
> with pathological commas that deserved to be ignored.  sometimes whitespace
> normalization was worthwhile; most often data errors had occurred which were
> otherwise hidden and mysterious failures.
>
> The worst error is the one you cannot find.  Strictness of commas
> (disallowed within n-tuples, allowed between n-tuples) is A Good Thing that
> leads to quality content and is not a blocker for any other content.

I generally agree. The question is only the distinction between high
quality and legality. I think validation often means more strictness
than legally necessary since it is optional anyways. It took me a
while to understand this, and some may not appreciate that at first.

A related question is if commas should actually be required for MF
types like MFImage to separate singletons. For MFImage, without a
comma is becomes necessary to rely on the width x height count to
determine where the next singleton begins.

-Andreas

>>     in any case, great start! have added it to X3D v4.0 schema for further
>> experimentation, in this case under the annotations until working better.
>>
>>     deployed online.  however can't get svn to work here so checkins will
>> be another time.
>>
>>
>> http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFImage.html
>>
>>
>>     On 6/7/2018 11:11 AM, Andreas Plesch wrote:
>>      > Ok, let's give it a try:
>>      >
>>      > [ \t]*(([0-9]|[1-9][0-9]+)([
>> ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*
>>      >
>>      > In words:
>>      >
>>      > match any leading white space followed by
>>      > exactly three times the following
>>      >   - one of the following
>>      >     - either a single digit (including 0) or
>>      >     - a two or more digit number starting with a 1 to 9 digit
>>      >   - followed by either
>>      >    - one or more spaces
>>      >    - or the end of the string (accommodating the default '0 0 0'
>> case)
>>      > then optionally followed zero or more times by
>>      >    - one of the following
>>      >      - either a single digit (including 0) or
>>      >      - a two or more digit number starting with a 1 to 9 digit
>>      >      - 0x followed by at least one of
>>      >         - a single digit or
>>      >         - a to f letter or
>>      >         - A to F letter
>>      >    - followed by either
>>      >      - one or more spaces
>>      >      - or the end of the string
>
>
> all the best, Don
> --
> Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
>
-- 
Andreas Plesch
Waltham, MA 02453



More information about the x3d-public mailing list