[x3d-public] SFImage regex
Don Brutzman
brutzman at nps.edu
Thu Jun 7 23:26:24 PDT 2018
Lots of great questions. Apologies for not answering as best possible, am traveling.
a. Haven't included leading or trailing whitespace since examples all get canonicalized first.
Also the X3D XML Schema constructs all include <xs:whiteSpace value="collapse"/> for validation
b. We should also try to come up with a subpattern that handles one-and-only-one comma amidst other optional whitespace, avoiding catastrophic backtracking. The X3D and VRML encodings say to treat them as whitespace. The XML schema honors that but carefully - not allowing excess commas to go unnoticed since historically is the only indicator that a long line of values has gotten corrupted.
c. the differences of regex syntax among various languages are relatively minor and explained at length in lots of documentation... we should be able to avoid idiosyncrasies without too much trouble (i think that is the case already).
d. looked online and found plausible exemplars om stackoverflow; have added regex patterns for SF/MFBool, SFInt32, SFFloat/Double/Time. comments in x3d v4 schema show links.
e. have copied over from XML Schema and added regex patterns to X3DUOM for each field type. example:
<FieldType type="SFImage"
regex="[ \t]*(([0-9]|[1-9][0-9]+)([ ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*">
<InterfaceDefinition specificationUrl="http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/fieldsDef.html#SFImageAndMFImage"
appinfo="The SFImage field specifies a single uncompressed 2-dimensional pixel image. SFImage fields contain three integers representing the width, height and number of components in the image, followed by (width x height) hexadecimal or integer values representing the pixels in the image."/>
</FieldType>
f. X3DJSAIL support: have used this X3DUOM information further and added regex values and regex testing to field type objects. this should facilitate experimentation and everyday usage even further.
http://www.web3d.org/specifications/java/X3DJSAIL.html#regex
"Regular expression (regex) support in field types, for example SFVec3fObject REGEX string, pattern, matches() and matches(String value)."
http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#REGEX
http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#pattern
http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#matches--
http://www.web3d.org/specifications/java/javadoc/org/web3d/x3d/jsail/fields/SFVec3fObject.html#matches-java.lang.String-
Tested in HelloWorldProgram.java satisfactorily:
<!-- SFVec3f default=0 0 0, initial=1 2 3, setValue=4 5 6, multiply(2)=8 10 12, normalize()=0.45584232 0.5698029 0.68376344, regex matches()=true -->
<!-- regex SFVec3f().matches("1 2 3")=true, regex SFVec3f().matches("1 2 3 4")=false -->
Having fun with X3D Regexes!
http://www.web3d.org/specifications/X3dRegularExpressions.html
On 6/7/2018 2:27 PM, Andreas Plesch wrote:
> Apologies for using this thread to keep a record since this is getting technical.
>
> The XML encoding for SFImage
>
> http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingOfFields.html#SFImage
>
> mentions whitespace as separator for pixel values. So that would include any kind of whitespace, and perhaps repeated whitespace.
>
> Looking at XML, it has its own regex definition including character classes:
>
> https://www.w3.org/TR/xmlschema11-2/#cces
>
> 4.2.5 lists popular classes including \d for decimal digits and \s for common whitespace. So it should be possible to use those as they are wildly recognized outside of XML as well.
>
> XML regexes also are anchored implicitly at the start and end, meaning there are no partial matches. Since this is unusual outside of XML, it probably should be mentioned somewhere on the x3d regex page. This is especially important if the regexes are intended to be used for other encodings such as VRML as well.
>
> -Andreas
>
> On Thu, Jun 7, 2018 at 4:55 PM Andreas Plesch <andreasplesch at gmail.com <mailto:andreasplesch at gmail.com>> wrote:
>
> Two more observations which may be worth while being stated explicitly:
>
> The regexes are expected to be used just against attribute strings, not the complete element xml, or scene xml . I think that is implied by how XML native data types are referenced.
> Partial matches do not count as successful. That means, there needs to be an additional check if the matched portion of the string is identical to the string. I think that is implied how the existing regexes are formulated.
>
> And two more question:
>
> The existing regexes do not allow for leading white space. It looks like this is inspired by XML spec. regexes: https://www.w3.org/TR/xmlschema11-2/#decimal . However, native XML decimal integers and floats allow leading white space due to the fixed whiteSpace: collapse restriction.
> Should therefore optional leading white space be added to the existing regexes ? I think so, or alternatively removed from native using types (by not using native types).
> For SF fields which contain multiple numbers, such as SFVec or SFColor, the existing regexes require exactly one space character as separator. What is the rationale for not allowing repeated space characters which may help with formatting ?
>
> -Andreas
>
> On Thu, Jun 7, 2018 at 2:11 PM Andreas Plesch <andreasplesch at gmail.com <mailto:andreasplesch at gmail.com>> wrote:
>
> Since I got started with the regexes, let's look at SFImage as it is still a TODO.
>
> http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/fieldsDef.html#SFImageAndMFImage
>
> appears to be the main (only?) source of the format description.
>
> We need exactly three decimal non-negative integers followed by a zero or more non-negative decimal or hexadecimal integers.
>
> There is no mentioning of the separator, so let's look at some example scenes:
>
> http://www.web3d.org/x3d/content/examples/ConformanceNist/Appearance/PixelTexture/index.html
>
> The NIST examples all use space as separator.
> http://www.web3d.org/x3d/tooltips/X3dTooltips.html#PixelTexture examples all use space.
>
> http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFImage.html has a minimum length of 5, presumably as a result of three single digits for wdth, height and components plus two separator characters.
>
> So one question is: Are single commas legal to separate numbers in SFImage from each other ? http://www.web3d.org/specifications/X3dRegularExpressions.html says commas are only allowed in MF fields, so let's say the answer is no.
>
> What about leading zeroes ? The general guidance in http://www.web3d.org/specifications/X3dRegularExpressions.html does not allow leading zeroes.
>
> There is a requirement to have width x height x component number of pixel values but I am not sure if this requirement can be (easily) checked by a regex.
>
> Capital letters: http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/fieldsDef.html#SFImageAndMFImage only uses capital A-F in hexadecimal example values but let's say a-f is also allowed since they are used in almost all examples.
>
> Ok, let's give it a try:
>
> [ \t]*(([0-9]|[1-9][0-9]+)([ ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*
>
> In words:
>
> match any leading white space followed by
> exactly three times the following
> - one of the following
> - either a single digit (including 0) or
> - a two or more digit number starting with a 1 to 9 digit
> - followed by either
> - one or more spaces
> - or the end of the string (accommodating the default '0 0 0' case)
> then optionally followed zero or more times by
> - one of the following
> - either a single digit (including 0) or
> - a two or more digit number starting with a 1 to 9 digit
> - 0x followed by at least one of
> - a single digit or
> - a to f letter or
> - A to F letter
> - followed by either
> - one or more spaces
> - or the end of the string
> The spaces are written as [ ] for clarity but could be just a space character.
>
> This allows 0 0 0 as it is the default value. It also allows 1 2 3 without any pixel values.but we do not aim at checking the number of pixel values anyways.
>
> Any input, in particular examples of problem cases welcome. Since images can be huge it may be necessary to optimize for performance and memory which may require more regex expertise than I can bring myself to acquire.
>
> -Andreas
>
> --
> Andreas Plesch
> Waltham, MA 02453
>
>
>
> --
> Andreas Plesch
> Waltham, MA 02453
>
>
>
> --
> Andreas Plesch
> Waltham, MA 02453
>
>
> _______________________________________________
> x3d-public mailing list
> x3d-public at web3d.org
> http://web3d.org/mailman/listinfo/x3d-public_web3d.org
>
all the best, Don
--
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman at nps.edu
Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
More information about the x3d-public
mailing list