[x3d-public] SFImage regex
Andreas Plesch
andreasplesch at gmail.com
Sat Jun 9 09:11:09 PDT 2018
On Sat, Jun 9, 2018 at 11:37 AM, Andreas Plesch <andreasplesch at gmail.com> wrote:
> On Fri, Jun 8, 2018 at 6:13 PM, Don Brutzman <brutzman at nps.edu> wrote:
>> onward...
> ..
>>> however the default value of '0 0 0' fails according to XML Spy, so
>>> there is still a problem here...
>>> I could confirm that this is due to XML regex expressions not considering
>>> anchors, in particular the $ end of string anchor.
>> OK... haven't used them elsewhere in the regexes. Further info:
>> Start of String and End of String Anchors
>> https://www.regular-expressions.info/anchors.html
>> parenthetically OBTW also found this reference, constructing a
>> nondeterministic finite automaton (NFA). Whoa.
>> Ken Thompson's construction algorithm
>> https://en.wikipedia.org/wiki/Thompson%27s_construction_algorithm
> This is presumably what regex implementations do.
>>> Without anchors the regex becomes longer and a bit more complicated, still
>>> not allowing leading zeroes or commas:
>>> ^\s*(\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*\s*$
>>> (allowing for leading and trailing white space, and anchoring)
>>> (\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*
>>> (untested XML version which is implicitly anchored, assuming the whitespace
>>> collapsing restriction)
>> Am using this second form as our next version, consistent with other
>> approaches provided.
>> for MFImage, adapted it with a prepended ( and then appended
>> (\s)*(,)?(\s)*)* in order to consume intermediate whitespace plus single
>> optional comma.
> Yes, something like that:
> where SFIMAGE is the XML regex above. That is how I would start.
> SFIMAGE followed zero or more times by of all of single comma or at
> least one whitespace followed by another SFIMAGE.
> There may be a shorter pattern.
> Hm, actually this does not quite work for SFImages separated by
> whitespace such as:
> 1 1 1 0xff 1 1 1 0xaa
> since this pattern would be matched as single image MFImage which means that:
> 1 1 1 0xff 1 0x1 1 0xaa
> would also be matched but should not since a hex value is used for
> height in the second image.
> So this would only work for comma separated MFImages:
> 1 1 1 0xff , 1 0x1 1 0xaa
> would now correctly not match.
> Perhaps that is good enough since sane MFImage values would probably
> use commas anyways.
is a regex along these lines which actually requires commas for
checking of more than one image. Without a comma, it just assumes that
it is a single image MFImage. It is so long because it repeats the
SFImage pattern.
I added some tests as well.
> Not sure which fields actually use MFImage.
>>> https://regex101.com/r/CJ1h80/4/tests
>> wow that saved regex pattern + rationale is utterly awesome. screenshot
>> attached and online at
> The ability to save unit test strings along with the pattern is really
> nice. The only caveat is that the site may not be around forever,
> However, the full site source is on Github, I believe,
>> http://www.web3d.org/specifications/images/Regex101SFImageTestingScreenshot.png
>> Your customized regex page also answers how we might connect regex online
>> testing directly to each type on the regex page... link now added at
>> http://www.web3d.org/specifications/X3dRegularExpressions.html#SFImage
>>> lets everybody test and improve this regex with some of the more
>>> challenging cases as unit tests, including the default 0 0 0 case.
>> all in! as you will see, the site allowed me to add a test corresponding to
>> the PixelTexture.x3d and PixelTextureBW.x3d models from X3D for Web Authors
>> (X3D4WA) archive.
> Yes, nice. If you click update regex after adding tests or changing
> something, everything is saved in a new version:
> https://regex101.com/r/CJ1h80/5/tests
>>> It is more important to come up with examples which almost match but
>>> actually should not match than testing conforming examples such as the web3d
>>> example library. Of course there should not be any false positives, eg. not
>>> matching examples which are in fact fine.
>> well OK but not a worry. relative priorities are all good when we "divide +
>> conquer" like this.
> Yes, all good. Just saying that it is relatively straightforward to
> find a regex which successfully matches all examples (.* in the
> extreme) but hard which does not match almost matching examples (and
> still matches all good examples).
>> testing all archive scenes helps reveal unintended side effects from changes
>> as well.
> Yes, we do not want any false alarms.
>> ... also OBE. building on last night's progress, I have added
>>> =========================================================================================
>> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#validate--
>> validate() methods to each of the simple types that provide an error message
>> when a value fails. excerpt:
>> public SFVec3fObject setValueByString (String newValue) throws
>> InvalidFieldValueException
>> {
>> if (newValue == null)
>> newValue = new String(); // Principle of Least Astonishment
>> (POLA)
>> //
>> https://en.wikipedia.org/wiki/Principle_of_least_astonishment
>> if (!SFVec3fObject.matches(newValue)) // regex test
>> {
>> String errorNotice = "*** Regular expression (regex)
>> failure, new SFVec3fObject(" + newValue + ")";
>> validationResult.append(errorNotice).append("\n");
>> }
>> =========================================================================================
>> http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/Grouping/TransformObject.html#validate--
>> validate() methods for each field to the comprehensive validate() checks in
>> each node object.
>> excerpt from TransformObject validate() method:
>> setScaleOrientation(getScaleOrientation()); // exercise field
>> checks, simple types
>> validationResult.append((new
>> SFRotationObject(getScaleOrientation())).validate()); // regex check of
>> corresponding String value
>> setTranslation(getTranslation()); // exercise field checks, simple
>> types
>> validationResult.append((new
>> SFVec3fObject(getTranslation())).validate()); // regex check of
>> corresponding String value
>> [...]
>> =========================================================================================
>> just ran some initial tests, it revealed a copy/paste error on my part that
>> was fixed. update uploads in progress.
>> as before, am only experimenting on the x3d-4.0 schema and X3DUOM object
>> model with these regex tests. will propagate across prior versions once
>> stable.
>> full regression tests on example archives are now poised to run, will follow
>> up at a later time. of course those kinds of tests can also find errors in
>> X3D scene content as well as the regexes themselves.
>>> The new regex had to split the matching of the 3 initial integers into two
>>> parts. First matching the first integer, then matching at least one
>>> whitespace followed by an integer, twice.
>>> For optional value matching, the order is now at least one whitespace
>>> followed by a number rather than number then whitespace, to avoid requiring
>>> whitespace at the very end. This in turn required an ordering of the or
>>> number branches to hex, then multidigits, then single digits for reasons I
>>> hesitate to fully explore but probably having to do with greediness.
>>> The new regex also uses the \d and \s classes for conciseness, as well as
>>> Leonards idea to limit hex values to at most 8 characters.
>> interesting certainly. i got as far as i could today.
> This was just a rather too brief attempt to explaining the changes in
> the new regex
>> improved guidance:
>> ==================
>> http://www.web3d.org/specifications/X3dRegularExpressions.html#X3dPatterns
> Would we allow a plus sign (+) for SFImage pixel values ? No reason
> anybody would want that but it may not be wrong.
> The SFInt32 pattern does not allow for hex values.
> I think it would be helpful to note that the ^ and $ metacharacters
> for start and end of string matching are not used due to XML regex
> compatibility.
>> General regex design considerations for X3D XML Schema include the
>> following.
>> * For numeric types, leading sign characters (+ or -) are optionally
>> present.
>> * For numeric types, leading zeroes are not allowed, except for an optional
>> leading zero preceding the decimal point when the significand is only
>> fractional.
>> * Intermediate commas are treated as whitespace, but only allowed between
>> each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f
>> array do not contain comma characters.
>> * Careful design allows use of regexes that can also be adapted to
>> JavaScript/JSON, Java and other language environments.
>> * These regexes all assume that leading/trailing whitespace has been
>> removed. It is possible to prepend/append regex constructs such as (\s)+ to
>> consume outer whitespace.
>> ==================
>>> TODO: clarify commas, spec. says commas in SF fields are whitespace, so
>>> should be allowed. But allowing in SF fields and then not allowing them in
>>> the corresponding MF field for other than set separation is quite confusing.
>> I have been watching for years and have yet to see (or imagine) any scene
>> with pathological commas that deserved to be ignored. sometimes whitespace
>> normalization was worthwhile; most often data errors had occurred which were
>> otherwise hidden and mysterious failures.
>> The worst error is the one you cannot find. Strictness of commas
>> (disallowed within n-tuples, allowed between n-tuples) is A Good Thing that
>> leads to quality content and is not a blocker for any other content.
> I generally agree. The question is only the distinction between high
> quality and legality. I think validation often means more strictness
> than legally necessary since it is optional anyways. It took me a
> while to understand this, and some may not appreciate that at first.
> A related question is if commas should actually be required for MF
> types like MFImage to separate singletons. For MFImage, without a
> comma is becomes necessary to rely on the width x height count to
> determine where the next singleton begins.
> -Andreas
>>> in any case, great start! have added it to X3D v4.0 schema for further
>>> experimentation, in this case under the annotations until working better.
>>> deployed online. however can't get svn to work here so checkins will
>>> be another time.
>>> http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFImage.html
>>> On 6/7/2018 11:11 AM, Andreas Plesch wrote:
>>> > Ok, let's give it a try:
>>> >
>>> > [ \t]*(([0-9]|[1-9][0-9]+)([
>>> ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*
>>> >
>>> > In words:
>>> >
>>> > match any leading white space followed by
>>> > exactly three times the following
>>> > - one of the following
>>> > - either a single digit (including 0) or
>>> > - a two or more digit number starting with a 1 to 9 digit
>>> > - followed by either
>>> > - one or more spaces
>>> > - or the end of the string (accommodating the default '0 0 0'
>>> case)
>>> > then optionally followed zero or more times by
>>> > - one of the following
>>> > - either a single digit (including 0) or
>>> > - a two or more digit number starting with a 1 to 9 digit
>>> > - 0x followed by at least one of
>>> > - a single digit or
>>> > - a to f letter or
>>> > - A to F letter
>>> > - followed by either
>>> > - one or more spaces
>>> > - or the end of the string
>> all the best, Don
>> --
>> Don Brutzman Naval Postgraduate School, Code USW/Br brutzman at nps.edu
>> Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
>> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
> --
> Andreas Plesch
> Waltham, MA 02453
Andreas Plesch
Waltham, MA 02453
More information about the x3d-public
mailing list