[x3d-public] Mantis 1320: relax requirement for quoted single-string value in MFString array
Don Brutzman
brutzman at nps.edu
Sun Jul 5 14:19:29 PDT 2020
Not sure you are tracking my most recent proposal today... but appreciate the effort, am trying again.
On 7/5/2020 1:03 PM, Michalis Kamburelis wrote:
> NPS Security WARNING: This is an external email. Do not click links or open attachments unless you recognize the sender and have confidence the content is safe.
>
> 1. I have to admit that the new proposal seems also not OK for me. Differently handling a string, based on a character that may be in the middle of the string is non-intuitive. It would be surprising in any "normal" programming language, so I say it should be avoided in X3D XML too.
Allowing a special case typically has special conditions, in any language.
XML is a data language, not a programming language.
> In your proposal,
>
> <Text string='lorem ipsum lorem ipsum lorem ipsum lorem ipsum' />
>
> is treated very differently than
>
> <Text string='lorem ipsum lorem ipsum " lorem ipsum lorem ipsum' />
>
> Namely, the 1st would be interpreted as MFString with 1 value, while the 2nd would be an invalid MFString.
Yes that is what is proposed: we start allowing the first case.
Currently both cases are considered invalid X3D content.
So if you choose to be completely strict on the first case, you are failing content that is unambiguously parsable/recoverable. Greater robustness is possible.
> Imagine you had a rule in Java / C++ / Pascal / JavaScript like that, that a character in the middle of the string changes the rules of interpreting this string literal. This would be bad IMHO, it creates confusion for people that write this, because it's easy to introduce the double-quote without realizing that now you "switch" to a very different syntax for strings.
X3D software implementers are a special class of people who already must pay close attention to such detail. X3D is both very expressive and very strict (thank goodness).
A good rule for output is always be strict. So any X3D-capable tool does not have to utilize this unquoted-singleton rule on serialized output, though for acceptable singleton cases it is simple to handle and more concise to read.
Tool outputs get a lot of scrutiny. Validators quickly detect whether a tool's output patterns are erroneous.
If X3D software implementers were the only stakeholders here, avoiding the common case might be sufficient justification (and has been in earlier versions of VRML/X3D. But we are attempting to integrate X3D with the entire Web infrastructure, which has many more stakeholders.
> Some additional notes:
>
> 2. You still need to account that " and " should be detected. Since literal " may occur inside, and in all our MFString handling so far, they were equivalent. So if anything the pseudocode should be
>
> if (!value.isEmpty() and !value.contains('"') and !value.contains('"')) ...
Not necessarily... that is a pseudocode condition, for clarity.
- XML parsers convert " to " before your X3D parser. So you only need to test once for that specific character, based on syntax for " character (often an escaped value) your chosen programming language.
- If you are writing your own direct parser for the raw text in a .x3d file, you must still follow XML rules and convert " to " in your input stream.
> Because those 2 should be treated as equal:
>
> <Text string="something " something" />
> <Text string='something " something' />
yes these two are equal. yes these two should remain erroneous.
> Actually the pseudocode can be simpler, because at this point, the XML entities are resolved, thinking about a typical implementation:
>
> if (!value.isEmpty() and !value.contains('"')) ...
OK good we converged on same result. This matches what I am saying above too.
> 3. You have addressed the issue of empty string in the new proposal (by checking "!value.isEmpty()") -- OK, I have no problem with that anymore :)
agreed, good improvement
> 4. In general I know it can be implemented easily in any X3D parsing implementation. No problem with that.
>
> My problem is that introducing such a rule is bad for consistency of X3D XML encoding, it is going to surprise the authors. In my eyes, we look at a difficult case ("how do I encode MFString") and instead of making the current specification more precise (as per the proposed https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding ), we instead add an exception to the rule. This feels like complicating the rules.
Stating 'here is a simple case, then follow all the same rules' seems like the simplest possible refinement, not a complication.
We'll work on considering prose in your pages with specification prose at a future time. Making sure it is in Mantis means it is part of the review process and won't get forgotten.
Hmmm here is an interesting thought: WWVD, What Would VRML Do? Link and excerpt follow:
[4] Extensible 3D (X3D) encodings 19776-2, Part 2: Classic VRML encoding
5 Encoding of fields, 5.1.2 Description
https://www.web3d.org/documents/specifications/19776-2/V3.3/Part02/EncodingOfFields.html#Description
========================================
"This clause describes the syntax of field data type values.
There are two general classes of fields: fields that contain a single value (where, for example, a value may be a single number, a vector, or even an image), and fields that contain an ordered list of multiple values. Single-valued fields have names that begin with SF. Multiple-valued fields have names that begin with MF.
Multiple-valued fields are written as an ordered list of values enclosed in square brackets and separated by whitespace. If the field has zero values, only the square brackets ("[ ]") are written. The last value may optionally be followed by whitespace. If the field has exactly one value, the brackets may be omitted."
========================================
Aha. So VRML parsing allows presence of a singleton value in an MFString without encapsulating delimiters; separate grammar rules for parsing VRML avoid treating misquoted structures as valid.
Confirmed by ClassicVRML grammar:
[5] Extensible 3D (X3D) encodings 19776-2, Part 2: Classic VRML encoding
Annex A Grammar, A.4 Fields
https://www.web3d.org/documents/specifications/19776-2/V3.3/Part02/grammar.html#Fields
====================
mfstringValue ::=
sfstringValue |
[ ] |
[ sfstringValues ] ;
====================
Seems totally analogous to zero-quotes first case provided above that we are exploring for XML.
Rephrase: VRML encoding allows singleton SFString values for a one-element MFString, so XML encoding should as well.
So perhaps, in some sense, we are actually simplifying and making X3D more consistent across file encodings... 8)
> 5. You're welcome to link my wiki to Mantis.
>
> I prefer to keep my proposal on a wiki page -- this way it is public. If any future X3D browser will have trouble with interpreting this part of X3D XML spec, they can find my wiki page by searching the Internet, and it also summarizes "what the existing X3D browsers are doing". So I like to have this information public.
agreed, was thinking the same thing, thanks
> 6. My clarifications on https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding do not say explicitly "XML entities are handled" because that is obvious IMHO, i.e. every XML reader will do this, no choice :) Also, this was never a source of confusion, everyone seems to understand that XML entities are handled in an XML attribute.
Actually I think that lack of awareness of XML preprocessing has led to some confusion as this XML-encoded MFString issue has floated around. If saying a bit more up front might clarify that, why not? Whatever works. Your page, your call.
> 7. I'm happy to defer it and/or talk about it more :)
>
> I understand your reasons, of course. The way MFString is encoded in X3D XML is not perfect. Additional double-quotes *are* surprising, they require explanation.
>
> But the way I see it, your proposal makes it more complicated and "surprising" for authors. That is why I object :)
We are all are quite welcome to object, and (as pointed out several times already) also welcome to live with the consequences of such unquoted MFString content occurring on a regular basis.
> 8. Dreaming:
>
> If we would design X3D XML from scratch, I would say to go for a completely different, branch new approach: use XML elements to specify MFString. Like this:
>
> <Text>
> <string>
> <item>one line</item>
> </string>
> </Text>
>
> or
>
> <Text>
> <string>
> <item>one line</item>
> <item>second line</item>
> </string>
> </Text>
>
> This is very verbose -- yes. But it is also perfectly clear, unambiguous, to a human and machine both. It would avoid any additional discussion about the double quotes, backslashes, and so on. Because we would use XML to express a list, instead of "forcing" a list inside a single XML attribute (as we do now with MFString).
>
> However, this is now only an impossible dream. We already have current MFString encoding in XML. It's easy to point out the design mistake in hindsight, of course -- when designing it, I'm sure it was not obvious that it's an unoptimal approach.
Thanks for stepping back from the ledge!! If you want another "rabbit hole" to fall into, you are welcome to read an opinion piece from twenty years ago:
[5] Wrapper Tags Considered Harmful, Don Brutzman, 6 February 2000
https://www.web3d.org/x3d/content/examples/development/WrapperTagsConsideredHarmful.html
> Best regards,
> Michalis
We keep ratcheting closer on unquoted XML MFString singleton, to good effect each time. Again thanks, take care.
> niedz., 5 lip 2020 o 21:22 Don Brutzman <brutzman at nps.edu <mailto:brutzman at nps.edu>> napisał(a):
>
> Vince and Michalis: thanks for steady strain on the line! Responses follow in a block. We're having fun now... :0
>
> a. Too complicated? OK. When in doubt, simplify. if any degree of embedded quotation marks are troublesome, then we simply add to XML encoding specification prose a statement like [1]:
>
> [1] "Non-empty MFString values that do not contain any quotation marks are treated as a single string element."
>
> Relevant pseudocode repeated, there are many ways to implement this in any language:
>
> if (!value.isEmpty() and !value.contains('"')) // example approach #3
> {
> newMFStringValue = new MFString() of length 1;
> newMFStringValue[0] = value; // processed text from XML parser
> }
> else // continue parsing as before
>
> Am appreciating this ultra-simple approach even more than before since it can't mistakenly accept a regular MFString definition that erroneously omits a single quotation mark. Let's avoid any complicating details whatsoever.
>
> Wondering, will we be able to live with something like that? Persevering forward...
>
> ---
>
> b. Regarding pseudocode, not sure how it can be an "error" per se since it is just trying illustrate an algorithm.
>
> If you mean the intended algorithm is insufficient, then OK I understand your reply.
>
> ---
>
> c. Problem-solving alignment: we can sharpen focus of any questions like "How would [JavaScript/Java/Python/JSON/VRML/C++/ObjectPascal/etc.] handle this example?" solely on reading and parsing XML text into an MFString object.
>
> Step by step parsing chain:
> c.1. an MFString value in a text file is read and processed as text by XML parser, producing plain text (" becomes ", etc.).
> c.2. That raw-string value is then handed off to X3D parser which applies X3D parsing rules to create MFString object.
>
> Since XML is well defined and parsing libraries are well supported, high-quality implementation paths are expected to be possible for each programming language.
>
> Side effect benefit: our large regression suite and diverse Quality Assurance (QA) tools means that any problematic or erroneous XML values are detectable and testable (and often fixable too).
>
> ---
>
> d. Regarding your attempted addition of a second line to an unquoted value: that is clearly a 2-element MFString array and not relevant. Any such case is a regular MFString construct of quoted single-string elements, for example value='"Hello" "World"'
>
> ---
>
> e. Emptiness:
>
> e.1. MFString array of length 1 holding empty single-string remains unchanged:
> value=' "" ' or value='""' or value='""' value="""" etc.
>
> e.2. Empty MFString remains unchanged: since value='' remains the MFString empty array (i.e. empty list) and also the default MFString value. Equally legal is complete removal of the value='' empty attribute, which can be optionally omitted by any XML processor, which is therefore never seen by X3D parser.
>
> Summary: no changes in existing MFString approach to (e.1) singleton empty string, or (e.2) empty list.
>
> ---
>
> f. Whitespace.
>
> We might optionally disallow whitespace for this approach (for example value=' ') by amending the [1] specification statement above as
>
> [2] "Non-empty MFString values that do not contain any quotation marks, and containing characters other than whitespace, are treated as a single string element."
>
> This might be more robust since it ignores fields with perhaps-inadvertent stray whitespace while honoring actual visible text. But we're talking about rules for plain-old whitespace, which seems like overreaching...
>
> Equivalent existing (and clearly deliberate) example: value='" "'
>
> If specification sentence [2] is considered complicated or troublesome, then let's stick with specification sentence [1].
>
> ---
>
> g. I hope the recent HAnimHumanoid info -> MetadataSet example helps illustrate the significant merits of value terseness.
>
> Getting this issue done well seems critical for future of metadata with X3D models.
>
> ---
>
> h. Next steps: we are making progress but am thinking we need to work together more effectively on this issue so that we proceed towards resolution, rather than circling.
>
> Mantis issue now raised, distilled issue summaries/resolutions can go there. Let's link prior emails back in there too.
>
> [3] Mantis 1320: relax requirement for quoted single-string value in MFString array
> https://www.web3d.org/member-only/mantis/view.php?id=1320
>
> Let's link your pages in Mantis, Michalis. They are correct, as far as they go, but am still not seeing much detail on important considerations: XML preprocessing of characters before values are parsed as MFString.
>
> Steadily improving our mutual understanding and XML-syntax examples will help.
>
> ---
>
> i. Given the furious pace of current X3D4 activity, let's again defer any "deep dive" and have a meeting post-SIGGRAPH dedicated to reviewing this topic. Entry criteria:
>
> - Mantis 1320 issues/alternatives/links updated in a distilled manner (e.g. your pages),
> - Suggested specification prose (such as [1] and [2] above),
> - Example XML content that we agree should either work or fail (such as values defined above).
>
> ---
>
> j. Motivation revisited.
>
> Whether any implementation is convinced about merits of defining a simple unquoted MFString value, or not, the X3D ecosystem will definitely encounter (likely many) such examples in actual practice. Being robust means handling them appropriately whenever reasonably possible, rather than failing (which is unpleasant to all concerned) or processing inconsistently (which is dangerous).
>
> So we all share a strong incentive to find a solution here. We appear to have gotten a few steps closer on this current round of scrutiny, which is great!
>
> Again thanks for close care and diligent examination of the issue alternatives.
>
> v/r Don
>
>
> On 7/5/2020 4:48 AM, Michalis Kamburelis wrote:
> > Don,
> >
> > About your implementation and samples, there are 2 errors. Which shows my point -- this issue has complicated details :)
> >
> > 1.You need to account that the character may also be " (literal double quote), they don't need to be expressed using HTML entities when you surround the XML attribute using apostrophes. (One point of our proposed text, on https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding , was to clarify this).
> >
> > 2. Your example <Text string='we \"love\" you Ma!'/> does not work like you probably think. Since now this is treated like an SFString, the backslashes are *not* special, and so they would be visible. Please see my discussion of backslashes on the bottom of https://github.com/michaliskambi/x3d-tests/wiki/Clarify-the-usage-of-quotes-and-backslashes-for-MFString-and-SFString-in-XML-encoding . This was discussed on this mailing list, with a consensus, and my description there matches what browsers are already doing.
> >
> > And there remains a problem Vince noted: what is an empty string now? Previously it was MFString with 0 items. Now it will be MFString with 1 item equal to an empty string. This difference matters in some edge cases e.g. "Text" node will have a different size now, as it has 1 row.
> >
> > In general, the rationale """be liberal in what you accept""" is not convincing IMHO. That is not what a usual programming language is doing. You don't want to liberally accept everything that may make sense -- because this leads to a language with lots of special rules and allowances, which leads to a language that is hard to understand / predict. Instead, the usual guideline is that language should apply the rules consistently.
> >
> > My practical problem is that this new rule would create a confusion for authors. Your own example shows this, as you were confused with how <Text string='we \"love\" you Ma!'/> works.
> >
> > I imagine these scenarios:
> >
> > 1. I start with a <Text string="bla bla" /> . Cool, it just works.
> >
> > - But now I want to add a second line. So I need to learn to surround the first line in double quotes now anyway: <Text string=""bla bla" "another line"" />.
> >
> > 2. I start with a <Text string='we "love" you Ma!' />. Cool, it just works.
> >
> > - But now I want to add a second line. So I need to learn to surround the first line in double quotes now anyway: <Text string='"we "love" you Ma!" "another line"' />
> >
> > - And now I need to learn about backslashes, because the above example will fail. Both " and " inside delimit string parts, so I actually need <Text string='"we \"love\" you Ma!" "another line"' />
> >
> > These cases show how an author would get confused. Having to do the correct thing (with quotes and backslashes) at the beginning, means that you have a harder start (you need to understand these quotes and backslashes), but then you have a smooth way ahead, because it's obvious how to add a 2nd string.
> >
> > And existing tools (both writing and reading) have learned to cope with this throughout the years. I doubt that these additional quotes are a blocker for X3D adoption -- it's trivial to explain them, "they are there because you may have multiple strings, and we have a general way of writing many strings inside one XML attribute".
> >
> > All in all, I'm still not convinced.
> >
> > Regards,
> > Michalis
> >
> > niedz., 5 lip 2020 o 05:01 vmarchetti at kshell.com <mailto:vmarchetti at kshell.com> <mailto:vmarchetti at kshell.com <mailto:vmarchetti at kshell.com>> <vmarchetti at kshell.com <mailto:vmarchetti at kshell.com> <mailto:vmarchetti at kshell.com <mailto:vmarchetti at kshell.com>>> napisał(a):
> >
> > Apologies, this fragment of a reply was sent prematurely and is incomplete, ignore it for now.
> >
> > Vince
> >
> >
> > > On Jul 4, 2020, at 10:58 PM, vmarchetti at kshell.com <mailto:vmarchetti at kshell.com> <mailto:vmarchetti at kshell.com <mailto:vmarchetti at kshell.com>> wrote:
> > >
> > >
> > >
> > >> On Jul 4, 2020, at 9:26 PM, Don Brutzman <brutzman at nps.edu <mailto:brutzman at nps.edu> <mailto:brutzman at nps.edu <mailto:brutzman at nps.edu>>> wrote:
> > >>
> > >> Thanks for your note... yes we had a brief discussion some months ago, will look it up. Didn't want to dwell on the topic then (each of us too busy!!) but now more metadata is getting created and this relaxation will help.
> > >>
> > >> My understanding of years of implementation-complexity objections heard to this XML encoding relaxation have not appeared to be grounded by practice, which in this case is quite simple.
> > >>
> > >> For example, pseudocode adaptable to ~any language that is parsing text into X3D objects:
> > >>
> > >> if (!value.startsWith('"') and !value.endsWith('"')) // example approach #1
> > >> {
> > >> newMFStringValue = new MFString() of length 1;
> > >> newMFStringValue[0] = value;
> > >> }
> > >> else // continue parsing as before
> > >>
> > >>
> > >
> > > However, I think this approach of coming up with code or pseudo code to explain or document the standard doesn't account that implementors will want to (and I think should) use standardized code to do all the XML processing, including applying the XML rules for un-encoding the way that attribute values can be encoded in XML.
> > > For example, given an XML fragment
> > > <Element fruit='"apple"'/>
> > >
> > > However, the same could be
> > >
> > > Someone using the standard Python module xml.etree to parse the document would only see a Python object with a (Python) attribute named fruit and a value which was the unicode value "apple" . If this occurred in an X3D document,
> > > this would be interpreted as an SFString with value "apple" or an MFString of one element with value apple .
>
> all the best, Don
> --
> Don Brutzman Naval Postgraduate School, Code USW/Br brutzman at nps.edu <mailto:brutzman at nps.edu>
> Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
> X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
>
all the best, Don
--
Don Brutzman Naval Postgraduate School, Code USW/Br brutzman at nps.edu
Watkins 270, MOVES Institute, Monterey CA 93943-5000 USA +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
More information about the x3d-public
mailing list