[x3d-public] SFImage regex

Don Brutzman brutzman at nps.edu
Fri Jun 8 15:13:51 PDT 2018


onward...

On 6/8/2018 11:36 AM, Andreas Plesch wrote:
> On Fri, Jun 8, 2018 at 2:19 AM Don Brutzman <brutzman at nps.edu <mailto:brutzman at nps.edu>> wrote:
> 
>     i tried testing this against PixelTexture value
> 
>              2 4 3 0xff0000 0xffff00 0x007700 0xff0077 0x0000ff 0xff7700 0x00ff77 0x888888
> 
>     from scene
> 
>     http://X3dGraphics.com/examples/X3dForWebAuthors/Chapter05AppearanceMaterialTextures/PixelTexture.x3d
> 
>     using https://regex101.com and that tests satisfactorily.
> 
>     however the default value of '0 0 0' fails according to XML Spy, so there is still a problem here...
> 
> I could confirm that this is due to XML regex expressions not considering anchors, in particular the $ end of string anchor.

OK... haven't used them elsewhere in the regexes.  Further info:

	Start of String and End of String Anchors
	https://www.regular-expressions.info/anchors.html

parenthetically OBTW also found this reference, constructing a nondeterministic finite automaton (NFA).  Whoa.

	Ken Thompson's construction algorithm
	https://en.wikipedia.org/wiki/Thompson%27s_construction_algorithm

> Without anchors the regex becomes longer and a bit more complicated, still not allowing leading zeroes or commas:
> 
> ^\s*(\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))*\s*$ (allowing for leading and trailing white space, and anchoring)
> 
> (\d|[1-9]\d+)(\s+(\d|[1-9]\d+)){2}(\s+((0x([a-f]|[A-F}|\d]){1,8})|[1-9]\d+|\d))* (untested XML version which is implicitly anchored, assuming the whitespace collapsing restriction)

Am using this second form as our next version, consistent with other approaches provided.

for MFImage, adapted it with a prepended ( and then appended (\s)*(,)?(\s)*)* in order to consume intermediate whitespace plus single optional comma.

> https://regex101.com/r/CJ1h80/4/tests

wow that saved regex pattern + rationale is utterly awesome.  screenshot attached and online at

	http://www.web3d.org/specifications/images/Regex101SFImageTestingScreenshot.png
Your customized regex page also answers how we might connect regex online testing directly to each type on the regex page... link now added at

	http://www.web3d.org/specifications/X3dRegularExpressions.html#SFImage

> lets everybody test and improve this regex with some of the more challenging cases as unit tests, including the default 0 0 0 case.

all in!  as you will see, the site allowed me to add a test corresponding to the PixelTexture.x3d and PixelTextureBW.x3d models from X3D for Web Authors (X3D4WA) archive.

> It is more important to come up with examples which almost match but actually should not match than testing conforming examples such as the web3d example library. Of course there should not be any false positives, eg. not matching examples which are in fact fine.

well OK but not a worry.  relative priorities are all good when we "divide + conquer" like this.

testing all archive scenes helps reveal unintended side effects from changes as well.

... also OBE. building on last night's progress, I have added

=========================================================================================
http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#validate--

validate() methods to each of the simple types that provide an error message when a value fails.  excerpt:

public SFVec3fObject setValueByString (String newValue) throws InvalidFieldValueException
{
	if (newValue == null)
		newValue = new String(); // Principle of Least Astonishment (POLA)
		// https://en.wikipedia.org/wiki/Principle_of_least_astonishment

	if (!SFVec3fObject.matches(newValue)) // regex test
	{
		String errorNotice = "*** Regular expression (regex) failure, new SFVec3fObject(" + newValue + ")";
		validationResult.append(errorNotice).append("\n");
	}
=========================================================================================
http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/Grouping/TransformObject.html#validate--

validate() methods for each field to the comprehensive validate() checks in each node object.

excerpt from TransformObject validate() method:

	setScaleOrientation(getScaleOrientation()); // exercise field checks, simple types
	validationResult.append((new SFRotationObject(getScaleOrientation())).validate());  // regex check of corresponding String value

	setTranslation(getTranslation()); // exercise field checks, simple types
		validationResult.append((new SFVec3fObject(getTranslation())).validate());  // regex check of corresponding String value
[...]
=========================================================================================

just ran some initial tests, it revealed a copy/paste error on my part that was fixed.  update uploads in progress.

as before, am only experimenting on the x3d-4.0 schema and X3DUOM object model with these regex tests.  will propagate across prior versions once stable.

full regression tests on example archives are now poised to run, will follow up at a later time.  of course those kinds of tests can also find errors in X3D scene content as well as the regexes themselves.

> The new regex had to split the matching of the 3 initial integers into two parts. First matching the first integer, then matching at least one whitespace followed by an integer, twice.
> For optional value matching, the order is now at least one whitespace followed by a number rather than number then whitespace, to avoid requiring whitespace at the very end. This in turn required an ordering of the or number branches to hex, then multidigits, then single digits for reasons I hesitate to fully explore but probably having to do with greediness.
> The new regex also uses the \d and \s classes for conciseness, as well as Leonards idea to limit hex values to at most 8 characters.

interesting certainly.  i got as far as i could today.

improved guidance:
==================
http://www.web3d.org/specifications/X3dRegularExpressions.html#X3dPatterns

General regex design considerations for X3D XML Schema include the following.
* For numeric types, leading sign characters (+ or -) are optionally present.
*  For numeric types, leading zeroes are not allowed, except for an optional leading zero preceding the decimal point when the significand is only fractional.
*  Intermediate commas are treated as whitespace, but only allowed between each singleton value. For example, SFVec3f 3-tuple values within an MFVec3f array do not contain comma characters.
*  Careful design allows use of regexes that can also be adapted to JavaScript/JSON, Java and other language environments.
*  These regexes all assume that leading/trailing whitespace has been removed. It is possible to prepend/append regex constructs such as (\s)+ to consume outer whitespace.
==================

> TODO: clarify commas, spec. says commas in SF fields are whitespace, so should be allowed. But allowing in SF fields and then not allowing them in the corresponding MF field for other than set separation is quite confusing.

I have been watching for years and have yet to see (or imagine) any scene with pathological commas that deserved to be ignored.  sometimes whitespace normalization was worthwhile; most often data errors had occurred which were otherwise hidden and mysterious failures.

The worst error is the one you cannot find.  Strictness of commas (disallowed within n-tuples, allowed between n-tuples) is A Good Thing that leads to quality content and is not a blocker for any other content.

> Give it a spin, -Andreas
> 
>     in any case, great start! have added it to X3D v4.0 schema for further experimentation, in this case under the annotations until working better.
> 
>     deployed online.  however can't get svn to work here so checkins will be another time.
> 
>     http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFImage.html
> 
> 
>     On 6/7/2018 11:11 AM, Andreas Plesch wrote:
>      > Ok, let's give it a try:
>      >
>      > [ \t]*(([0-9]|[1-9][0-9]+)([ ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*
>      >
>      > In words:
>      >
>      > match any leading white space followed by
>      > exactly three times the following
>      >   - one of the following
>      >     - either a single digit (including 0) or
>      >     - a two or more digit number starting with a 1 to 9 digit
>      >   - followed by either
>      >    - one or more spaces
>      >    - or the end of the string (accommodating the default '0 0 0' case)
>      > then optionally followed zero or more times by
>      >    - one of the following
>      >      - either a single digit (including 0) or
>      >      - a two or more digit number starting with a 1 to 9 digit
>      >      - 0x followed by at least one of
>      >         - a single digit or
>      >         - a to f letter or
>      >         - A to F letter
>      >    - followed by either
>      >      - one or more spaces
>      >      - or the end of the string

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Regex101SFImageTestingScreenshot.png
Type: image/png
Size: 216136 bytes
Desc: not available
URL: <http://web3d.org/pipermail/x3d-public_web3d.org/attachments/20180608/9e036bac/attachment-0001.png>


More information about the x3d-public mailing list