[x3d-public] X3D Regex

Leonard Daly Leonard.Daly at realism.com
Thu Jun 7 18:19:55 PDT 2018


It was hard to follow the entire thread and comment perhaps multiple 
times. I am trying to merge all of my comments into one message, so I 
hope I've covered everything.

For a validator, it _should_ exactly reflect the spec and not be nice to 
hand coders. This doesn't mean that X3D browsers would fail on some of 
the common expressions, but the validator should not flag non-spec 
values as valid.

In general, it is important to note that an empty string (field="") is 
not the same as the field not being specified. It may be the case for 
some or all fields that being empty and not defined both produce the 
default value.

For XML encoding 
(http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingOfFields.html), 
section 5.1.2 discusses trailing white space only for MF fields. Should 
it be inferred that leading whitespace for MF and leading or trailing 
white space for SF is not allowed?

The standard decimal regex for integers is /^[+-]?\d+$/

This supports an optional leading +/- then one or more digits [0-9].

Hex numbers would be something like: /^0x[0-9a-fA-F]{1,8}$/

Does the regex need to indicate the maximum number of digits? The above 
hex one does, but the decimal one does not. These also assume that white 
space is not allowed.

As already noted, when dealing with MFInt32, the regex needs to take the 
separate character(s) into account. The spec says that the values are 
separated by white space (tabs are legal white space) and commas may be 
used as whitespace.

This gives something like: /^(SFINT[,\s]+)*$/

Where SFINT is the regex for the individual SFINT value. This regex does 
not allow leading white space. It requires one or more white space 
characters (regular white space or comma), then the basic unit is 
repeated (or not at all). This does not do an integral match for the 
number of elements (to ensure that and an n-tuple has the correct # of 
elements). Note that trailing white space is allowed. Values such as 
'0,,,0  ' is legal and creates a 2-element array since commas are white 
space. This is not quite spec compliant in that it allows commas after 
the last value and the spec states "commas may only be used as 
whitespace between values of individual base datatypes".


Leonard Daly

P.S. I did not test all of the regex's, so I may be slightly off. The 
correct representation should match the intent and not be far from what 
I have written.



>
> On Thu, Jun 7, 2018 at 10:34 AM Andreas Plesch 
> <andreasplesch at gmail.com <mailto:andreasplesch at gmail.com>> wrote:
>
>     Let's say we do _not_ like leading zeroes, multiple commas, white
>     space after the sign, and 0X for a hexadecial prefix.
>
>     ([ \t]*[-|+]?([0-9]|[1-9][0-9]+|(0x([0-9]|[a-f]|[A-F])+))([
>     \t]+|,))* // obsolete
>
>     is then proposed.
>
>
> Since above required a space or comma at the very end, we need to add 
> the end anchor ($) as being allowed after a number:
>
> ([ \t]*[-|+]?([0-9]|[1-9][0-9]+|(0x([0-9]|[a-f]|[A-F])+))([ \t]+|,|$))*
>
>     In words:
>
>     Match zero (to accommodate the empty array) or more times all of
>     the following:
>     optional leading white space (just space or tab), followed by
>     an optional + or minus sign, followed by
>     one of the following:
>     - a single digit (between 0 and 9, so including 0)
>     - a two or more digit number starting with a digit between 1 and 9
>      - a hexadecimal number starting with 0x and followed by at least
>     one of either
>       - a digit
>       - a letter between a and f
>       - a letter between A and F
>     which is then followed by either
>      - one or more white space
>      - or a single comma
>
>
> - or the end of the string
>
> Sorry for the oversight. Let's give this a good workout, -Andreas
>
>
>     Testing with online tools looks good.
>
>     Some more items of interest:
>
>     http://www.web3d.org/pipermail/x3d-public_web3d.org/2012-March/001950.html
>     is a previous discussion on floating points regex.
>
>     http://www.web3d.org/specifications/X3dRegularExpressions.html
>     claims that
>
>     The SFInt32 pattern is a native XML type:
>
>     <xs:restriction base="xs:integer"/>
>
>     https://www.w3.org/TR/xmlschema11-2/#integer is the definition.
>
>     The XML spec. says it is derived from 'decimal' so no hexadecimal
>     numbers. This probably needs to be fixed in
>     http://www.web3d.org/specifications/X3dRegularExpressions.html. There
>     does not seem to be a hexadecimal built-in XML data type, so probably
>     another regex is required. The other option is to deprecate
>     hexadecimal values for Int32. They are very rarely used anyways and
>     only provide a more compact representation once you get above 9999
>     (0x270F).
>
>     https://www.w3.org/TR/xmlschema11-2/#decimal actually also allows in
>     3.3.3.1 explicitly leading zeros indicating that the native XML types
>     may not be a good fit for X3D SFInt32 and therefore MFInt32.
>
>     -Andreas
>
>
>
>
>
>     On Thu, Jun 7, 2018 at 8:01 AM, Andreas Plesch
>     <andreasplesch at gmail.com <mailto:andreasplesch at gmail.com>> wrote:
>     > Hi Don
>     >
>     > On Thu, Jun 7, 2018, 12:17 AM Don Brutzman <brutzman at nps.edu
>     <mailto:brutzman at nps.edu>> wrote:
>     >>
>     >> great to see the dialog and scrutiny, thanks!
>     >>
>     >> intent is to allow legal/standard content but disallow (or at least
>     >> diagnose) broken/problematic/nonstandard content.
>     >
>     >
>     > I think it may be helpful to more definitely state what the
>     function of the
>     > regex is:
>     >
>     > - flag certainly broken content but still allow questionable content
>     > - only allow well behaved content and flag questionable but
>     parseable and
>     > possibly legal content
>     >
>     > There could be two versions, strict and lax.
>     >
>     >>
>     >> for MFInt32,
>     >>
>     >> - legal: 0 1 2 30 -0 -1 -2 -30
>     >>
>     >> - illegal: 010 -020 (leading zeroes also an indicator that
>     intermediate
>     >> whitespace was dropped)
>     >
>     >
>     > The official regex /((\+|\-)?(0|[1-9][0-9]*)?( )?(,)?( )?)*/ 
>     currently
>     > allows leading zeros as legal, perhaps by accident.
>     >
>     >>
>     >> - single comma OK between numbers, but multiple commas an
>     indicator that
>     >> an intermediate number was dropped
>     >
>     >
>     > The official regex currently allows multiple commas.
>     >
>     > Other patterns to clarify as being considered problematic include:
>     >
>     > - +0
>     > - -0
>     > - + 12 (white space after sign)
>     > - tab,newline,return as white space
>     > - capital letters for hex: 0xAF
>     > - 0X in addition to 0x as prefix for hexadecimal
>     >
>     >>
>     >> I have the O'Reilly books on this topic, they often give
>     good/adaptable
>     >> "cookbook recipes" worth considering.  can look further next week.
>     >>
>     >> it is helpful to be very wary of any conclusions whatsoever without
>     >> testing a regex.
>     >
>     >
>     > Oh yes.
>     >
>     >>
>     >>
>     >> if alternatives are found, great - we can test with online
>     tools, with
>     >> Netbeans/X3D-Edit, and with regression testing of the X3D
>     Examples Archive.
>     >
>     >
>     > It sounds like the current regex does not quite match expectations.
>     > The archive may not have leading zeros anywhere ? Or some of the
>     other
>     > problematic patterns like multiple commas ?
>     >
>     > -Andreas
>     >
>     >>
>     >> all the best, Don
>     >> --
>     >> Don Brutzman  Naval Postgraduate School, Code USW/Br
>     >> brutzman at nps.edu <mailto:brutzman at nps.edu>
>     >> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA
>     >> +1.831.656.2149
>     >> X3D graphics, virtual worlds, navy robotics
>     >> http://faculty.nps.edu/brutzman
>     >>
>     >
>
>
>
>     --
>     Andreas Plesch
>     Waltham, MA 02453
>
>
>
> -- 
> Andreas Plesch
> Waltham, MA 02453
>
>
> _______________________________________________
> x3d-public mailing list
> x3d-public at web3d.org
> http://web3d.org/mailman/listinfo/x3d-public_web3d.org


-- 
*Leonard Daly*
3D Systems & Cloud Consultant
LA ACM SIGGRAPH Past Chair
President, Daly Realism - /Creating the Future/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://web3d.org/pipermail/x3d-public_web3d.org/attachments/20180607/fc7b9298/attachment.html>


More information about the x3d-public mailing list