[x3d-public] Mantis 1320: relax requirement for quoted single-string value in MFString array

Michalis Kamburelis michalis.kambi at gmail.com
Sun Jul 5 13:03:01 PDT 2020


1. I have to admit that the new proposal seems also not OK for me.
Differently handling a string, based on a character that may be in the
middle of the string is non-intuitive. It would be surprising in any
"normal" programming language, so I say it should be avoided in X3D XML too.

In your proposal,

  <Text string='lorem ipsum lorem ipsum lorem ipsum lorem ipsum' />

is treated very differently than

  <Text string='lorem ipsum lorem ipsum " lorem ipsum lorem ipsum' />

Namely, the 1st would be interpreted as MFString with 1 value, while the
2nd would be an invalid MFString.

Imagine you had a rule in Java / C++ / Pascal / JavaScript like that, that
a character in the middle of the string changes the rules of interpreting
this string literal. This would be bad IMHO, it creates confusion for
people that write this, because it's easy to introduce the double-quote
without realizing that now you "switch" to a very different syntax for
strings.

Some additional notes:

2. You still need to account that " and " should be detected. Since
literal " may occur inside, and in all our MFString handling so far, they
were equivalent. So if anything the pseudocode should be

    if (!value.isEmpty() and !value.contains('"') and
!value.contains('"')) ...

  Because those 2 should be treated as equal:

    <Text string="something " something" />
    <Text string='something " something' />

  Actually the pseudocode can be simpler, because at this point, the XML
entities are resolved, thinking about a typical implementation:

  if (!value.isEmpty() and !value.contains('"')) ...

3. You have addressed the issue of empty string in the new proposal (by
checking "!value.isEmpty()") -- OK, I have no problem with that anymore :)

4. In general I know it can be implemented easily in any X3D parsing
implementation. No problem with that.

    My problem is that introducing such a rule is bad for consistency of
X3D XML encoding, it is going to surprise the authors. In my eyes, we look
at a difficult case ("how do I encode MFString") and instead of making the
current specification more precise (as per the proposed
https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding
), we instead add an exception to the rule. This feels like complicating
the rules.

5. You're welcome to link my wiki to Mantis.

    I prefer to keep my proposal on a wiki page -- this way it is public.
If any future X3D browser will have trouble with interpreting this part of
X3D XML spec, they can find my wiki page by searching the Internet, and it
also summarizes "what the existing X3D browsers are doing". So I like to
have this information public.

6. My clarifications on
https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding
do not say explicitly "XML entities are handled" because that is obvious
IMHO, i.e. every XML reader will do this, no choice :) Also, this was never
a source of confusion, everyone seems to understand that XML entities are
handled in an XML attribute.

7. I'm happy to defer it and/or talk about it more :)

    I understand your reasons, of course. The way MFString is encoded in
X3D XML is not perfect. Additional double-quotes *are* surprising, they
require explanation.

    But the way I see it, your proposal makes it more complicated and
"surprising" for authors. That is why I object :)

8. Dreaming:

    If we would design X3D XML from scratch, I would say to go for a
completely different, branch new approach: use XML elements to specify
MFString. Like this:

    <Text>
      <string>
        <item>one line</item>
      </string>
    </Text>

    or

    <Text>
      <string>
        <item>one line</item>
        <item>second line</item>
      </string>
    </Text>

    This is very verbose -- yes. But it is also perfectly clear,
unambiguous, to a human and machine both. It would avoid any additional
discussion about the double quotes, backslashes, and so on. Because we
would use XML to express a list, instead of "forcing" a list inside a
single XML attribute (as we do now with MFString).

    However, this is now only an impossible dream. We already have current
MFString encoding in XML. It's easy to point out the design mistake in
hindsight, of course -- when designing it, I'm sure it was not obvious that
it's an unoptimal approach.

Best regards,
Michalis

niedz., 5 lip 2020 o 21:22 Don Brutzman <brutzman at nps.edu> napisał(a):

> Vince and Michalis: thanks for steady strain on the line!  Responses
> follow in a block.  We're having fun now...  :0
>
> a. Too complicated?  OK.  When in doubt, simplify.  if any degree of
> embedded quotation marks are troublesome, then we simply add to XML
> encoding specification prose a statement like [1]:
>
> [1] "Non-empty MFString values that do not contain any quotation marks are
> treated as a single string element."
>
> Relevant pseudocode repeated, there are many ways to implement this in any
> language:
>
>      if (!value.isEmpty() and !value.contains('"')) // example
> approach #3
>      {
>          newMFStringValue = new MFString() of length 1;
>          newMFStringValue[0] = value;  // processed text from XML parser
>      }
>      else // continue parsing as before
>
> Am appreciating this ultra-simple approach even more than before since it
> can't mistakenly accept a regular MFString definition that erroneously
> omits a single quotation mark.  Let's avoid any complicating details
> whatsoever.
>
> Wondering, will we be able to live with something like that?  Persevering
> forward...
>
> ---
>
> b. Regarding pseudocode, not sure how it can be an "error" per se since it
> is just trying illustrate an algorithm.
>
> If you mean the intended algorithm is insufficient, then OK I understand
> your reply.
>
> ---
>
> c. Problem-solving alignment: we can sharpen focus of any questions like
> "How would [JavaScript/Java/Python/JSON/VRML/C++/ObjectPascal/etc.] handle
> this example?" solely on reading and parsing XML text into an MFString
> object.
>
> Step by step parsing chain:
> c.1. an MFString value in a text file is read and processed as text by XML
> parser, producing plain text (" becomes ", etc.).
> c.2. That raw-string value is then handed off to X3D parser which applies
> X3D parsing rules to create MFString object.
>
> Since XML is well defined and parsing libraries are well supported,
> high-quality implementation paths are expected to be possible for each
> programming language.
>
> Side effect benefit: our large regression suite and diverse Quality
> Assurance (QA) tools means that any problematic or erroneous XML values are
> detectable and testable (and often fixable too).
>
> ---
>
> d. Regarding your attempted addition of a second line to an unquoted
> value: that is clearly a 2-element MFString array and not relevant.  Any
> such case is a regular MFString construct of quoted single-string elements,
> for example value='"Hello" "World"'
>
> ---
>
> e. Emptiness:
>
> e.1. MFString array of length 1 holding empty single-string remains
> unchanged:
>       value=' "" ' or value='""' or value='""'
> value="""" etc.
>
> e.2. Empty MFString remains unchanged: since value='' remains the MFString
> empty array (i.e. empty list) and also the default MFString value.  Equally
> legal is complete removal of the value='' empty attribute, which can be
> optionally omitted by any XML processor, which is therefore never seen by
> X3D parser.
>
> Summary: no changes in existing MFString approach to (e.1) singleton empty
> string, or (e.2) empty list.
>
> ---
>
> f. Whitespace.
>
> We might optionally disallow whitespace for this approach (for example
> value='  ') by amending the [1] specification statement above as
>
> [2] "Non-empty MFString values that do not contain any quotation marks,
> and containing characters other than whitespace, are treated as a single
> string element."
>
> This might be more robust since it ignores fields with perhaps-inadvertent
> stray whitespace while honoring actual visible text.  But we're talking
> about rules for plain-old whitespace, which seems like overreaching...
>
> Equivalent existing (and clearly deliberate) example: value='"  "'
>
> If specification sentence [2] is considered complicated or troublesome,
> then let's stick with specification sentence [1].
>
> ---
>
> g. I hope the recent HAnimHumanoid info -> MetadataSet example helps
> illustrate the significant merits of value terseness.
>
> Getting this issue done well seems critical for future of metadata with
> X3D models.
>
> ---
>
> h. Next steps: we are making progress but am thinking we need to work
> together more effectively on this issue so that we proceed towards
> resolution, rather than circling.
>
> Mantis issue now raised, distilled issue summaries/resolutions can go
> there.  Let's link prior emails back in there too.
>
> [3] Mantis 1320: relax requirement for quoted single-string value in
> MFString array
>      https://www.web3d.org/member-only/mantis/view.php?id=1320
>
> Let's link your pages in Mantis, Michalis.  They are correct, as far as
> they go, but am still not seeing much detail on important considerations:
> XML preprocessing of characters before values are parsed as MFString.
>
> Steadily improving our mutual understanding and XML-syntax examples will
> help.
>
> ---
>
> i. Given the furious pace of current X3D4 activity, let's again defer any
> "deep dive" and have a meeting post-SIGGRAPH dedicated to reviewing this
> topic.  Entry criteria:
>
> - Mantis 1320 issues/alternatives/links updated in a distilled manner
> (e.g. your pages),
> - Suggested specification prose (such as [1] and [2] above),
> - Example XML content that we agree should either work or fail (such as
> values defined above).
>
> ---
>
> j. Motivation revisited.
>
> Whether any implementation is convinced about merits of defining a simple
> unquoted MFString value, or not, the X3D ecosystem will definitely
> encounter (likely many) such examples in actual practice.  Being robust
> means handling them appropriately whenever reasonably possible, rather than
> failing (which is unpleasant to all concerned) or processing inconsistently
> (which is dangerous).
>
> So we all share a strong incentive to find a solution here.  We appear to
> have gotten a few steps closer on this current round of scrutiny, which is
> great!
>
> Again thanks for close care and diligent examination of the issue
> alternatives.
>
> v/r Don
>
>
> On 7/5/2020 4:48 AM, Michalis Kamburelis wrote:
> > Don,
> >
> > About your implementation and samples, there are 2 errors. Which shows
> my point -- this issue has complicated details :)
> >
> > 1.You need to account that the character may also be " (literal double
> quote), they don't need to be expressed using HTML entities when you
> surround the XML attribute using apostrophes. (One point of our proposed
> text, on
> https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding
> , was to clarify this).
> >
> > 2. Your example <Text string='we \"love\" you Ma!'/> does not work like
> you probably think. Since now this is treated like an SFString, the
> backslashes are *not* special, and so they would be visible. Please see my
> discussion of backslashes on the bottom of
> https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding
> . This was discussed on this mailing list, with a consensus, and my
> description there matches what browsers are already doing.
> >
> > And there remains a problem Vince noted: what is an empty string now?
> Previously it was MFString with 0 items. Now it will be MFString with 1
> item equal to an empty string. This difference matters in some edge cases
> e.g. "Text" node will have a different size now, as it has 1 row.
> >
> > In general, the rationale """be liberal in what you accept""" is not
> convincing IMHO. That is not what a usual programming language is doing.
> You don't want to liberally accept everything that may make sense --
> because this leads to a language with lots of special rules and allowances,
> which leads to a language that is hard to understand / predict. Instead,
> the usual guideline is that language should apply the rules consistently.
> >
> > My practical problem is that this new rule would create a confusion for
> authors. Your own example shows this, as you were confused with how <Text
> string='we \"love\" you Ma!'/>  works.
> >
> > I imagine these scenarios:
> >
> > 1. I start with a <Text string="bla bla" /> . Cool, it just works.
> >
> >      - But now I want to add a second line. So I need to learn to
> surround the first line in double quotes now anyway: <Text
> string=""bla bla" "another line"" />.
> >
> > 2. I start with a <Text string='we "love" you Ma!' />. Cool, it just
> works.
> >
> >      - But now I want to add a second line. So I need to learn to
> surround the first line in double quotes now anyway: <Text string='"we
> "love" you Ma!" "another line"' />
> >
> >      - And now I need to learn about backslashes, because the above
> example will fail. Both " and " inside delimit string parts, so I
> actually need <Text string='"we \"love\" you Ma!" "another
> line"' />
> >
> > These cases show how an author would get confused. Having to do the
> correct thing (with quotes and backslashes) at the beginning, means that
> you have a harder start (you need to understand these quotes and
> backslashes), but then you have a smooth way ahead, because it's obvious
> how to add a 2nd string.
> >
> > And existing tools (both writing and reading) have learned to cope with
> this throughout the years. I doubt that these additional quotes are a
> blocker for X3D adoption -- it's trivial to explain them, "they are there
> because you may have multiple strings, and we have a general way of writing
> many strings inside one XML attribute".
> >
> > All in all, I'm still not convinced.
> >
> > Regards,
> > Michalis
> >
> > niedz., 5 lip 2020 o 05:01 vmarchetti at kshell.com <mailto:
> vmarchetti at kshell.com> <vmarchetti at kshell.com <mailto:
> vmarchetti at kshell.com>> napisał(a):
> >
> >     Apologies, this fragment of a reply was sent prematurely and is
> incomplete, ignore it for now.
> >
> >     Vince
> >
> >
> >      > On Jul 4, 2020, at 10:58 PM, vmarchetti at kshell.com <mailto:
> vmarchetti at kshell.com> wrote:
> >      >
> >      >
> >      >
> >      >> On Jul 4, 2020, at 9:26 PM, Don Brutzman <brutzman at nps.edu
> <mailto:brutzman at nps.edu>> wrote:
> >      >>
> >      >> Thanks for your note... yes we had a brief discussion some
> months ago, will look it up.  Didn't want to dwell on the topic then (each
> of us too busy!!) but now more metadata is getting created and this
> relaxation will help.
> >      >>
> >      >> My understanding of years of implementation-complexity
> objections heard to this XML encoding relaxation have not appeared to be
> grounded by practice, which in this case is quite simple.
> >      >>
> >      >> For example, pseudocode adaptable to ~any language that is
> parsing text into X3D objects:
> >      >>
> >      >>      if (!value.startsWith('"') and
> !value.endsWith('"')) // example approach #1
> >      >>      {
> >      >>              newMFStringValue = new MFString() of length 1;
> >      >>              newMFStringValue[0] = value;
> >      >>      }
> >      >>      else // continue parsing as before
> >      >>
> >      >>
> >      >
> >      > However, I think this approach of coming up with code or pseudo
> code to explain or document the standard doesn't account that implementors
> will want to (and I think should) use standardized code to do all the XML
> processing, including applying the XML rules for un-encoding the way that
> attribute values can be encoded in XML.
> >      > For example, given an XML fragment
> >      > <Element fruit='"apple"'/>
> >      >
> >      > However, the same could be
> >      >
> >      > Someone using the standard Python module xml.etree to parse the
> document would only see a Python object with a (Python) attribute named
> fruit and a value which was the unicode value "apple" . If this occurred in
> an X3D document,
> >      > this would be interpreted as an SFString with value "apple"  or
> an MFString of one element with value apple .
>
> all the best, Don
> --
> Don Brutzman  Naval Postgraduate School, Code USW/Br
> brutzman at nps.edu
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
> X3D graphics, virtual worlds, navy robotics
> http://faculty.nps.edu/brutzman
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://web3d.org/pipermail/x3d-public_web3d.org/attachments/20200705/301d13a1/attachment-0001.html>


More information about the x3d-public mailing list