[x3d-public] SFImage regex

Andreas Plesch andreasplesch at gmail.com
Thu Jun 7 14:27:32 PDT 2018


Apologies for using this thread to keep a record since this is getting
technical.

The XML encoding for SFImage

http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingOfFields.html#SFImage

mentions whitespace as separator for pixel values. So that would include
any kind of whitespace, and perhaps repeated whitespace.

Looking at XML, it has its own regex definition including character classes:

https://www.w3.org/TR/xmlschema11-2/#cces

4.2.5 lists popular classes including \d for decimal digits and \s for
common whitespace. So it should be possible to use those as they are wildly
recognized outside of XML as well.

XML regexes also are anchored implicitly at the start and end, meaning
there are no partial matches. Since this is unusual outside of XML, it
probably should be mentioned somewhere on the x3d regex page. This is
especially important if the regexes are intended to be used for other
encodings such as VRML as well.

-Andreas

On Thu, Jun 7, 2018 at 4:55 PM Andreas Plesch <andreasplesch at gmail.com>
wrote:

> Two more observations which may be worth while being stated explicitly:
>
> The regexes are expected to be used just against attribute strings, not
> the complete element xml, or scene xml . I think that is implied by how XML
> native data types are referenced.
> Partial matches do not count as successful. That means, there needs to be
> an additional check if the matched portion of the string is identical to
> the string. I think that is implied how the existing regexes are formulated.
>
> And two more question:
>
> The existing regexes do not allow for leading white space. It looks like
> this is inspired by XML spec. regexes:
> https://www.w3.org/TR/xmlschema11-2/#decimal . However, native XML
> decimal integers and floats allow leading white space due to the fixed
> whiteSpace: collapse restriction.
> Should therefore optional leading white space be added to the existing
> regexes ? I think so, or alternatively removed from native using types (by
> not using native types).
> For SF fields which contain multiple numbers, such as SFVec or SFColor,
> the existing regexes require exactly one space character as separator. What
> is the rationale for not allowing repeated space characters which may help
> with formatting ?
>
> -Andreas
>
> On Thu, Jun 7, 2018 at 2:11 PM Andreas Plesch <andreasplesch at gmail.com>
> wrote:
>
>> Since I got started with the regexes, let's look at SFImage as it is
>> still a TODO.
>>
>>
>> http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/fieldsDef.html#SFImageAndMFImage
>>
>> appears to be the main (only?) source of the format description.
>>
>> We need exactly three decimal non-negative integers followed by a zero or
>> more non-negative decimal or hexadecimal integers.
>>
>> There is no mentioning of the separator, so let's look at some example
>> scenes:
>>
>>
>> http://www.web3d.org/x3d/content/examples/ConformanceNist/Appearance/PixelTexture/index.html
>>
>> The NIST examples all use space as separator.
>> http://www.web3d.org/x3d/tooltips/X3dTooltips.html#PixelTexture examples
>> all use space.
>>
>>
>> http://www.web3d.org/specifications/X3dSchemaDocumentation4.0/x3d-4.0_SFImage.html
>> has a minimum length of 5, presumably as a result of three single digits
>> for wdth, height and components plus two separator characters.
>>
>> So one question is: Are single commas legal to separate numbers in
>> SFImage from each other ?
>> http://www.web3d.org/specifications/X3dRegularExpressions.html says
>> commas are only allowed in MF fields, so let's say the answer is no.
>>
>> What about leading zeroes ? The general guidance in
>> http://www.web3d.org/specifications/X3dRegularExpressions.html does not
>> allow leading zeroes.
>>
>> There is a requirement to have width x height x component number of pixel
>> values but I am not sure if this requirement can be (easily) checked by a
>> regex.
>>
>> Capital letters:
>> http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/fieldsDef.html#SFImageAndMFImage
>> only uses capital A-F in hexadecimal example values but let's say a-f is
>> also allowed since they are used in almost all examples.
>>
>> Ok, let's give it a try:
>>
>> [ \t]*(([0-9]|[1-9][0-9]+)([
>> ]+|$)){3}(([0-9]|([1-9][0-9]+)|(0x([0-9]|[a-f]|[A-F])+))([ ]+|$))*
>>
>> In words:
>>
>> match any leading white space followed by
>> exactly three times the following
>>  - one of the following
>>    - either a single digit (including 0) or
>>    - a two or more digit number starting with a 1 to 9 digit
>>  - followed by either
>>   - one or more spaces
>>   - or the end of the string (accommodating the default '0 0 0' case)
>> then optionally followed zero or more times by
>>   - one of the following
>>     - either a single digit (including 0) or
>>     - a two or more digit number starting with a 1 to 9 digit
>>     - 0x followed by at least one of
>>        - a single digit or
>>        - a to f letter or
>>        - A to F letter
>>   - followed by either
>>     - one or more spaces
>>     - or the end of the string
>>
>> The spaces are written as [ ] for clarity but could be just a space
>> character.
>>
>> This allows 0 0 0 as it is the default value. It also allows 1 2 3
>> without any pixel values.but we do not aim at checking the number of pixel
>> values anyways.
>>
>> Any input, in particular examples of problem cases welcome. Since images
>> can be huge it may be necessary to optimize for performance and memory
>> which may require more regex expertise than I can bring myself to acquire.
>>
>> -Andreas
>>
>> --
>> Andreas Plesch
>> Waltham, MA 02453
>>
>
>
> --
> Andreas Plesch
> Waltham, MA 02453
>


-- 
Andreas Plesch
Waltham, MA 02453
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://web3d.org/pipermail/x3d-public_web3d.org/attachments/20180607/c1784118/attachment-0001.html>


More information about the x3d-public mailing list