[x3d-public] Values for MFString fields in auto generated Clause 6 Encoding of nodes for XML encoding 19776-1; quotation escaping and empty string arrays

Roy Walmsley roy.walmsley at ntlworld.com
Sat May 30 11:37:37 PDT 2015


Don,

Great feedback below.

I totally agree that XML will allow multiple forms for string handling. My
rationale is that the specification must be consistent. So we have the
statement in 19776-1 5.15 about the format for SFString and MFString.
Therefore, the example encoding of nodes must agree with that. Having said
that, I think your comment about extra spaces is excellent. I will add that
to my output.

Referring back to XML permitting different forms I think it is very
appropriate to raise a specification comment / Mantis issue that raises the
possibility of modifying the specification to allow the different legal
forms.

Roy

-----Original Message-----
From: Don Brutzman [mailto:brutzman at nps.edu] 
Sent: 30 May 2015 19:09
To: Roy Walmsley
Cc: x3d at web3d.org; X3D Graphics public mailing list
Subject: Re: Values for MFString fields in auto generated Clause 6 Encoding
of nodes for XML encoding 19776-1; quotation escaping and empty string
arrays

cc: x3d-public since text encodings a common point of puzzlement.

Excellent analysis of SFString/MFString defaults Roy.  Looks like our review
is nearly complete.

First, here is a ton of background information that explains how this is all
sorted out.

Relevant issue for XML: attribute values can be enclosed by 'single quotes'
or "double quotes" satisfactorily.  Further the user agent is in charge of
which approach is used, meaning that an author can write an XML file with
one or both type of quotes but downstream applications are allowed to
substitute the alternate type of quotes.

Here are generic definitions of terms, more precise definitions can be found
in W3C Recommendations or IETF RFCs.
	http://en.wiktionary.org/wiki/user_agent
	(computing) A client application used by an end user, typically for
a network protocol such as HTTP or FTP.

	Hyponym:  web browser
	http://en.wiktionary.org/wiki/web_browser
	"A computer program used to navigate the World Wide Web, chiefly by
viewing web pages and following hyperlinks."

Escape characters are substitute codes for special characters.  Only a few
are built into HTML and XSLT parsers, but all characters have such
definitions.  References:

	X3D Scene Authoring Hints: HTML
	
http://www.web3d.org/x3d/content/examples/X3dSceneAuthoringHints.html#HTML

	Character entity references in HTML
	defines encoding values for special characters in XML/HTML files.
	http://www.w3.org/TR/html4/sgml/entities.html

	Character Entity Reference Chart
	http://dev.w3.org/html5/html-author/charref

If needed, character-escape codes can be used to represent a character that
would otherwise be treated as a terminator.  Equivalent examples for X3D:

	<Text string='"Hello" "World!"'/>
	<Text string=' "Hello" "World!" '/>
	<Text string=" "Hello" "World!" "/>
	<Text string=' "Hello" "World!" '/>
	etc.

These substitutions are easily tested using the HelloWorld.x3d example, or
any other scene with a Text node.
http://www.web3d.org/x3d/content/examples/HelloWorld.x3d

Coming up with a preferred form is very important, both for specification
clarity and also for keeping version-control examples consistent.  Thus the
preferred form is also part of X3D Canonicalization (C14N).

	ISO 19776-3, Part 3: Compressed binary encoding
	4.2.3 X3D canonical form
	
http://www.web3d.org/documents/specifications/19776-3/V3.3//Part03/concepts.
html#X3DCanonicalForm

Getting all of this correct is also fundamentally important for
Internationalization (I18N) and Localization (L10N), which is the inclusion
of non-English language content in X3D scenes.

	http://www.w3.org/International/questions/qa-i18n

Further details and examples can be found in Chapter 2 Geometry Primitives
of _X3D for Web Authors_ and supporting slideset.
http://x3dgraphics.com/slidesets/X3dForWebAuthors/Chapter02-GeometryPrimitiv
es.pdf
https://www.movesinstitute.org/Video/Courses/X3dForWebAuthors/X3dForWebAutho
rsVideo.html#2

Example scene:
X3D Example Archives: X3D for Web Authors, Chapter 02 - Geometry Primitives,
Text Special Characters
http://x3dgraphics.com/examples/X3dForWebAuthors/Chapter02-GeometryPrimitive
s/_pages/page09.html
http://x3dgraphics.com/examples/X3dForWebAuthors/Chapter02-GeometryPrimitive
s/TextSpecialCharacters.x3d
http://x3dgraphics.com/examples/X3dForWebAuthors/Chapter02-GeometryPrimitive
s/TextSpecialCharacters.html

Just in case anyone is not confused yet... 8)  things gets further tricky
when the author's string wants to include an apostrophe (' ') or a
quotation mark (" ") as part of the string content itself.  Many
possible combinations ensue, only some are legal.  X3D/VRML distinguishes an
embedded quotation mark by preceding it with the X3D escape character, which
is a backslash \ character.  Hence:

	He said, "Immel did it!"
becomes:
	<Text string='"He said, \"Immel did it!\""' />

Slivers of hope: to facilitate all this roundabout-ness with quotable
quotes, X3D-Edit's string editor strives to let an author simply type what
they want. X3D-Edit then handles all escaping chores, both for saving and
loading X3D text.  Works well for me.  Problem cases/examples are welcome,
will further improve the code to handle them.  String output is in canonical
form.
	https://savage.nps.edu/X3D-Edit

The X3D Canonicalization tool also handles and normalizes quotation marks.
It is offered as the C14N button in X3D-Edit.

	
https://svn.code.sf.net/p/x3d/code/www.web3d.org/x3d/tools/canonical/
	
https://svn.code.sf.net/p/x3d/code/www.web3d.org/x3d/tools/canonical/doc/x3d
Tools.htm

Thus a lot of different requirements have been successfully reconciled,
keeping X3D scenes fully internationalizable and fully compatible with HTML
and other XML-based Web Standards.  Thank goodness.  Happy deep diving!  8)

Now, onward:

Given this background, I think some small refinements in phrasing are needed
below for complete clarity & correctness.

On 5/30/2015 2:25 AM, Roy Walmsley wrote:
> Don,
>
> I was starting to check through the auto generated XML encoding and, on
looking at the first node which happens to be Anchor there is an MFString
field /url/.
>
> Currently this is presented in the auto generated file as
>
>                  url=""

First a warning: word processors will often use a stylized/curly quotation
mark such us that above.  It is not usable as an XML delimiter.  Some
references:

	http://dbaron.org/www/quotes

	http://www.dwheeler.com/essays/quotes-in-html.html

Of course I will assume you mean the "straight up and down" double-quote
quotation mark (aka " or " character).

Now that we "have things straight" so to speak, there are several
possibilities for defining an empty XML attribute as well:

	url=""
	url=''
	[omit, XML treats the attribute value as empty.  this is canonical
form for conciseness and file-size reduction.]

All three forms are legal and equivalent.  An intermediate processor (user
agent, such as a DOM tree reader or Web browser) can choose whichever it
wants when it reads your source.

I've opted for the first form above, so that our specifications are clear
and consistent.  (That also lets us know that our generation code hasn't
overlooked any fields.)

> Take another example, the GeoCoordinate node. It has a /geoSystem/ 
> field. The auto generated file presents this as
>
>                  geoSystem=""GD" "WE""
>
> Are either of these correct? I am thinking the answer is no. Neither is
correct.

Good question.  The canonical (and I think preferred) correct form is

	geoSystem='"GD" "WE"'

or possibly, for readability (since whitespace between SFString entries in
an MFString array are ignored),

	geoSystem=' "GD" "WE" '

I sometimes use the extra whitespace in examples so that people's eyes don't
cross when looking at adjacent single/double quotes... they are typically
hard to disambiguate visually.

Emphasizing terse/canonical form in specification wherever possible seems
like the best approach.

> To see why, refer to the ISO/IEC specification 19776-1, 5.15 SFString and
MFString
<http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingO
fFields.html#SFString>.
>
> The second sentence reads "SFString specifies a single string encoded as a
sequence of UTF-8 octets enclosed in double quotes (e.g., *"string"*)." So I
am agreed that the default value for an SFString field can be *""*, an empty
string.

yes.

this approach also matches default XML/HTML and all of the other
non-MFString field encodings.  so now we are down to one exceptional case,
MFString.

> The third sentence in 5.15 reads "The MFString specifies zero or more 
> SFStrings enclosed in single quotes (e.g., '"string1" "string2"')." 
> The first NOTE immediately following exemplifies this by saying 
> *"string3"* is not a valid instance of an MFString. It is properly 
> specified as *'"string3"'*. It then goes on to give further examples. 
> So, the /geoSystem/ field above is clearly wrong. It should read
>
>                  geoSystem='"GD" "WE"'

Here is that note again with properly copied quoting:

http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingOf
Fields.html#SFString
=======================
NOTE  The construct

     "string3"

is not a valid instance of an MFString. Such an MFString would be properly
specified as

     '"string3"'
=======================

This is an example of the insistence on strict quoting of SFString values
within an MFString array.  Recent working-group email thread also refers to
that:

[x3d] Status of PickableGroup node in Schema V3.3: quoting MFString defaults
even if a single SFString value
http://web3d.org/mailman/private/x3d_web3d.org/2015-May/003069.html

I would prefer that we consider the two forms of string3 above to be
consistent, i.e. MFString values without embedded quotes become allowed so
that author's intent (which is quite clear) results in an MFString with a
single SFString element, rather than an error.

sidebar: should we reopen this as an issue in the XML Encoding?  Am
expecting it will probably be essential for proper alignment of X3D/X3DOM
with HTML5.

Anyway, for the current work, we must remain completely strict of course and
so the NOTE remains correct.

> Similarly, we conclude that the /url/ field above is also wrong. It 
> should read
>
>                  url='""'

*This is an MFString array that contains a single SFString element, which
itself is an empty string.*

> But this says that the default gives the field one value which happens 
> to be the empty string. In fact, the third sentence above stated 
> "_zero_ or more SFStrings enclosed in single quotes". So, if we want 
> to specify the field to have no values the correct presentation is
>
>                  url=''

*This is an empty MFString array.*

The two examples you present are different values.  I agree that the second
form is correct for empty string array.

> I shall, therefore, modify the XSLT file to properly present the defaults
for MFString values.

Hopefully that equals empty MFString value

	url=''

OK will look at those implementation details separately - thanks.

Precisely phrased summary of spec form: since multiple approaches are
legally equivalent, we are presenting defaults for MFString values in
preferred form.

Whew!  Hope this rationale is correctly explained and makes sense for you &
everyone else too.  Feedback/questions/improvements always welcome.

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman





More information about the x3d-public mailing list