[x3d-public] SFImage regex

Andreas Plesch andreasplesch at gmail.com
Sat Jun 9 09:11:09 PDT 2018


On Sat, Jun 9, 2018 at 11:37 AM, Andreas Plesch <andreasplesch at gmail.com> wrote:
> On Fri, Jun 8, 2018 at 6:13 PM, Don Brutzman <brutzman at nps.edu> wrote:
>> onward...
>>
> ..
>>>     however the default value of '0 0 0' fails according to XML Spy, so
>>> there is still a problem here...
>>>
>>> I could confirm that this is due to XML regex expressions not considering
>>> anchors, in particular the $ end of string anchor.
>>
>> OK... haven't used them elsewhere in the regexes.  Further info:
>>
>>         Start of String and End of String Anchors
>>         https://www.regular-expressions.info/anchors.html
>>
>> parenthetically OBTW also found this reference, constructing a
>> nondeterministic finite automaton (NFA).  Whoa.
>>
>>         Ken Thompson's construction algorithm
>>         https://en.wikipedia.org/wiki/Thompson%27s_construction_algorithm
>>
>
> This is presumably what regex implementations do.
>
>>> Without anchors the regex becomes longer and a bit more complicated, still
>>> not allowing leading zeroes or commas:
>>>
>>>
>>> ^\s*(\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*\s*$
>>> (allowing for leading and trailing white space, and anchoring)
>>>
>>>
>>> (\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*
>>> (untested XML version which is implicitly anchored, assuming the whitespace
>>> collapsing restriction)
>>
>>
>> Am using this second form as our next version, consistent with other
>> approaches provided.
>>
>> for MFImage, adapted it with a prepended ( and then appended
>> (\s)*(,)?(\s)*)* in order to consume intermediate whitespace plus single
>> optional comma.
>
> Yes, something like that:
>
> SFIMAGE(,|\s+SFIMAGE)*
>
> where SFIMAGE is the XML regex above. That is how I would start.
>
> SFIMAGE followed zero or more times by of all of single comma or at
> least one whitespace followed by another SFIMAGE.
>
> There may be a shorter pattern.
>
> Hm, actually this does not quite work for SFImages separated by
> whitespace such as:
>
> 1 1 1 0xff 1 1 1 0xaa
>
> since this pattern would be matched as single image MFImage which means that:
>
> 1 1 1 0xff 1 0x1 1 0xaa
>
> would also be matched but should not since a hex value is used for
> height in the second image.
>
> So this would only work for comma separated MFImages:
>
> 1 1 1 0xff , 1 0x1 1 0xaa
>
> would now correctly not match.
>
> Perhaps that is good enough since sane MFImage values would probably
> use commas anyways.

https://regex101.com/r/Q5Csjc/2/tests

is a regex along these lines which actually requires commas for
checking of more than one image. Without a comma, it just assumes that
it is a single image MFImage. It is so long because it repeats the
SFImage pattern.

I added some tests as well.

-Andreas

> Not sure which fields actually use MFImage.
>
>>> https://regex101.com/r/CJ1h80/4/tests
>>
>> wow that saved regex pattern + rationale is utterly awesome.  screenshot
>> attached and online at
>
> The ability to save unit test strings along with the pattern is really
> nice. The only caveat is that the site may not be around forever,
> However, the full site source is on Github, I believe,
>
>> http://www.web3d.org/specifications/images/Regex101SFImageTestingScreenshot.png
>> Your customized regex page also answers how we might connect regex online
>> testing directly to each type on the regex page... link now added at
>>
>> http://www.web3d.org/specifications/X3dRegularExpressions.html#SFImage
>>
>>> lets everybody test and improve this regex with some of the more
>>> challenging cases as unit tests, including the default 0 0 0 case.
>>
>> all in!  as you will see, the site allowed me to add a test corresponding to
>> the PixelTexture.x3d and PixelTextureBW.x3d models from X3D for Web Authors
>> (X3D4WA) archive.
>
> Yes, nice. If you click update regex after adding tests or changing
> something, everything is saved in a new version:
> https://regex101.com/r/CJ1h80/5/tests
>
>>> It is more important to come up with examples which almost match but
>>> actually should not match than testing conforming examples such as the web3d
>>> example library. Of course there should not be any false positives, eg. not
>>> matching examples which are in fact fine.
>>
>> well OK but not a worry.  relative priorities are all good when we "divide +
>> conquer" like this.
>
> Yes, all good. Just saying that it is relatively straightforward to
> find a regex which successfully matches all examples (.* in the
> extreme) but hard which does not match almost matching examples (and
> still matches all good examples).
>
>> testing all archive scenes helps reveal unintended side effects from changes
>> as well.
>
> Yes, we do not want any false alarms.
>
>> ... also OBE. building on last night's progress, I have added
>>> =========================================================================================
>> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#validate--
>>
>> validate() methods to each of the simple types that provide an error message
>> when a value fails.  excerpt:
>>
>> public SFVec3fObject setValueByString (String newValue) throws
>> InvalidFieldValueException
>> {
>>         if (newValue == null)
>>                 newValue = new String(); // Principle of Least Astonishment
>> (POLA)
>>                 //
>> https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>>
>>         if (!SFVec3fObject.matches(newValue)) // regex test
>>         {
>>                 String errorNotice = "*** Regular expression (regex)
>> failure, new SFVec3fObject(" + newValue + ")";
>>                 validationResult.append(errorNotice).append("\n");
>>         }
>> =========================================================================================
>> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/Grouping/TransformObject.html#validate--
>>
>> validate() methods for each field to the comprehensive validate() checks in
>> each node object.
>>
>> excerpt from TransformObject validate() method:
>>
>>         setScaleOrientation(getScaleOrientation()); // exercise field
>> checks, simple types
>>         validationResult.append((new
>> SFRotationObject(getScaleOrientation())).validate());  // regex check of
>> corresponding String value
>>
>>         setTranslation(getTranslation()); // exercise field checks, simple
>> types
>>                 validationResult.append((new
>> SFVec3fObject(getTranslation())).validate());  // regex check of
>> corresponding String value
>> [...]
>> =========================================================================================
>>
>> just ran some initial tests, it revealed a copy/paste error on my part that
>> was fixed.  update uploads in progress.
>>
>> as before, am only experimenting on the x3d-4.0 schema and X3DUOM object
>> model with these regex tests.  will propagate across prior versions once
>> stable.
>>
>> full regression tests on example archives are now poised to run, will follow
>> up at a later time.  of course those kinds of tests can also find errors in
>> X3D scene content as well as the regexes themselves.
>>
>>> The new regex had to split the matching of the 3 initial integers into two
>>> parts. First matching the first integer, then matching at least one
>>> whitespace followed by an integer, twice.
>>> For optional value matching, the order is now at least one whitespace
>>> followed by a number rather than number then whitespace, to avoid requiring
>>> whitespace at the very end. This in turn required an ordering of the or
>>> number branches to hex, then multidigits, then single digits for reasons I
>>> hesitate to fully explore but probably having to do with greediness.
>>> The new regex also uses the \d and \s classes for conciseness, as well as
>>> Leonards idea to limit hex values to at most 8 characters.
>>
>> interesting certainly.  i got as far as i could today.
>
> This was just a rather too brief attempt to explaining the changes in
> the new regex
>
>> improved guidance:
>> ==================
>> http://www.web3d.org/specifications/X3dRegularExpressions.html#X3dPatterns
>
> Would we allow a plus sign (+) for SFImage pixel values ? No reason
> anybody would want that but it may not be wrong.
> The SFInt32 pattern does not allow for hex values.
> I think it would be helpful to note that the ^ and $ metacharacters
> for start and end of string matching are not used due to XML regex
> compatibility.
>
>> General regex design considerations for X3D XML Schema include the
>> following.
>> * For numeric types, leading sign characters (+ or -) are optionally
>> present.
>> *  For numeric types, leading zeroes are not allowed, except for an optional
>> leading zero preceding the decimal point when the significand is only
>> fractional.
>> *  Intermediate commas are treated as whitespace, but only allowed between
>> each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f
>> array do not contain comma characters.
>> *  Careful design allows use of regexes that can also be adapted to
>> JavaScript/JSON, Java and other language environments.
>> *  These regexes all assume that leading/trailing whitespace has been
>> removed. It is possible to prepend/append regex constructs such as (\s)+ to
>> consume outer whitespace.
>> ==================
>>
>>> TODO: clarify commas, spec. says commas in SF fields are whitespace, so
>>> should be allowed. But allowing in SF fields and then not allowing them in
>>> the corresponding MF field for other than set separation is quite confusing.
>>
>> I have been watching for years and have yet to see (or imagine) any scene
>> with pathological commas that deserved to be ignored.  sometimes whitespace
>> normalization was worthwhile; most often data errors had occurred which were
>> otherwise hidden and mysterious failures.
>>
>> The worst error is the one you cannot find.  Strictness of commas
>> (disallowed within n-tuples, allowed between n-tuples) is A Good Thing that
>> leads to quality content and is not a blocker for any other content.
>
> I generally agree. The question is only the distinction between high
> quality and legality. I think validation often means more strictness
> than legally necessary since it is optional anyways. It took me a
> while to understand this, and some may not appreciate that at first.
>
> A related question is if commas should actually be required for MF
> types like MFImage to separate singletons. For MFImage, without a
> comma is becomes necessary to rely on the width x height count to
> determine where the next singleton begins.
>
> -Andreas
>
>>>     in any case, great start! have added it to X3D v4.0 schema for further
>>> experimentation, in this case under the annotations until working better.
>>>
>>>     deployed online.  however can't get svn to work here so checkins will
>>> be another time.
>>>
>>>
>>> http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFImage.html
>>>
>>>
>>>     On 6/7/2018 11:11 AM, Andreas Plesch wrote:
>>>      > Ok, let's give it a try:
>>>      >
>>>      > [ \t]*(([0-9]|[1-9][0-9]+)([
>>> ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*
>>>      >
>>>      > In words:
>>>      >
>>>      > match any leading white space followed by
>>>      > exactly three times the following
>>>      >   - one of the following
>>>      >     - either a single digit (including 0) or
>>>      >     - a two or more digit number starting with a 1 to 9 digit
>>>      >   - followed by either
>>>      >    - one or more spaces
>>>      >    - or the end of the string (accommodating the default '0 0 0'
>>> case)
>>>      > then optionally followed zero or more times by
>>>      >    - one of the following
>>>      >      - either a single digit (including 0) or
>>>      >      - a two or more digit number starting with a 1 to 9 digit
>>>      >      - 0x followed by at least one of
>>>      >         - a single digit or
>>>      >         - a to f letter or
>>>      >         - A to F letter
>>>      >    - followed by either
>>>      >      - one or more spaces
>>>      >      - or the end of the string
>>
>>
>> all the best, Don
>> --
>> Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
>> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
>> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
>>
> --
> Andreas Plesch
> Waltham, MA 02453



-- 
Andreas Plesch
Waltham, MA 02453



More information about the x3d-public mailing list