[x3d-public] Values for MFString fields in auto generated Clause 6 Encoding of nodes for XML encoding 19776-1; quotation escaping and empty string arrays

Don Brutzman brutzman at nps.edu
Sat May 30 11:08:44 PDT 2015


cc: x3d-public since text encodings a common point of puzzlement.

Excellent analysis of SFString/MFString defaults Roy.  Looks like our review is nearly complete.

First, here is a ton of background information that explains how this is all sorted out.

Relevant issue for XML: attribute values can be enclosed by 'single quotes' or "double quotes" satisfactorily.  Further the user agent is in charge of which approach is used, meaning that an author can write an XML file with one or both type of quotes but downstream applications are allowed to substitute the alternate type of quotes.

Here are generic definitions of terms, more precise definitions can be found in W3C Recommendations or IETF RFCs.
	http://en.wiktionary.org/wiki/user_agent
	(computing) A client application used by an end user, typically for a network protocol such as HTTP or FTP.

	Hyponym:  web browser
	http://en.wiktionary.org/wiki/web_browser
	"A computer program used to navigate the World Wide Web, chiefly by viewing web pages and following hyperlinks."

Escape characters are substitute codes for special characters.  Only a few are built into HTML and XSLT parsers, but all characters have such definitions.  References:

	X3D Scene Authoring Hints: HTML
	http://www.web3d.org/x3d/content/examples/X3dSceneAuthoringHints.html#HTML

	Character entity references in HTML
	defines encoding values for special characters in XML/HTML files.
	http://www.w3.org/TR/html4/sgml/entities.html

	Character Entity Reference Chart
	http://dev.w3.org/html5/html-author/charref

If needed, character-escape codes can be used to represent a character that would otherwise be treated as a terminator.  Equivalent examples for X3D:

	<Text string='"Hello" "World!"'/>
	<Text string=' "Hello" "World!" '/>
	<Text string=" "Hello" "World!" "/>
	<Text string=' "Hello" "World!" '/>
	etc.

These substitutions are easily tested using the HelloWorld.x3d example, or any other scene with a Text node.
http://www.web3d.org/x3d/content/examples/HelloWorld.x3d

Coming up with a preferred form is very important, both for specification clarity and also for keeping version-control examples consistent.  Thus the preferred form is also part of X3D Canonicalization (C14N).

	ISO 19776-3, Part 3: Compressed binary encoding
	4.2.3 X3D canonical form
	http://www.web3d.org/documents/specifications/19776-3/V3.3//Part03/concepts.html#X3DCanonicalForm

Getting all of this correct is also fundamentally important for Internationalization (I18N) and Localization (L10N), which is the inclusion of non-English language content in X3D scenes.

	http://www.w3.org/International/questions/qa-i18n

Further details and examples can be found in Chapter 2 Geometry Primitives of _X3D for Web Authors_ and supporting slideset.
http://x3dgraphics.com/slidesets/X3dForWebAuthors/Chapter02-GeometryPrimitives.pdf
https://www.movesinstitute.org/Video/Courses/X3dForWebAuthors/X3dForWebAuthorsVideo.html#2

Example scene:
X3D Example Archives: X3D for Web Authors, Chapter 02 - Geometry Primitives, Text Special Characters
http://x3dgraphics.com/examples/X3dForWebAuthors/Chapter02-GeometryPrimitives/_pages/page09.html
http://x3dgraphics.com/examples/X3dForWebAuthors/Chapter02-GeometryPrimitives/TextSpecialCharacters.x3d
http://x3dgraphics.com/examples/X3dForWebAuthors/Chapter02-GeometryPrimitives/TextSpecialCharacters.html

Just in case anyone is not confused yet... 8)  things gets further tricky when the author's string wants to include an apostrophe (' ') or a quotation mark (" ") as part of the string content itself.  Many possible combinations ensue, only some are legal.  X3D/VRML distinguishes an embedded quotation mark by preceding it with the X3D escape character, which is a backslash \ character.  Hence:

	He said, "Immel did it!"
becomes:
	<Text string='"He said, \"Immel did it!\""' />

Slivers of hope: to facilitate all this roundabout-ness with quotable quotes, X3D-Edit's string editor strives to let an author simply type what they want. X3D-Edit then handles all escaping chores, both for saving and loading X3D text.  Works well for me.  Problem cases/examples are welcome, will further improve the code to handle them.  String output is in canonical form.
	https://savage.nps.edu/X3D-Edit

The X3D Canonicalization tool also handles and normalizes quotation marks.  It is offered as the C14N button in X3D-Edit.

	https://svn.code.sf.net/p/x3d/code/www.web3d.org/x3d/tools/canonical/
	https://svn.code.sf.net/p/x3d/code/www.web3d.org/x3d/tools/canonical/doc/x3dTools.htm

Thus a lot of different requirements have been successfully reconciled, keeping X3D scenes fully internationalizable and fully compatible with HTML and other XML-based Web Standards.  Thank goodness.  Happy deep diving!  8)

Now, onward:

Given this background, I think some small refinements in phrasing are needed below for complete clarity & correctness.

On 5/30/2015 2:25 AM, Roy Walmsley wrote:
> Don,
>
> I was starting to check through the auto generated XML encoding and, on looking at the first node which happens to be Anchor there is an MFString field /url/.
>
> Currently this is presented in the auto generated file as
>
>                  url=””

First a warning: word processors will often use a stylized/curly quotation mark such us that above.  It is not usable as an XML delimiter.  Some references:

	http://dbaron.org/www/quotes

	http://www.dwheeler.com/essays/quotes-in-html.html

Of course I will assume you mean the "straight up and down" double-quote quotation mark (aka " or " character).

Now that we "have things straight" so to speak, there are several possibilities for defining an empty XML attribute as well:

	url=""
	url=''
	[omit, XML treats the attribute value as empty.  this is canonical form for conciseness and file-size reduction.]

All three forms are legal and equivalent.  An intermediate processor (user agent, such as a DOM tree reader or Web browser) can choose whichever it wants when it reads your source.

I've opted for the first form above, so that our specifications are clear and consistent.  (That also lets us know that our generation code hasn't overlooked any fields.)

> Take another example, the GeoCoordinate node. It has a /geoSystem/ field. The auto generated file presents this as
>
>                  geoSystem=””GD” “WE””
>
> Are either of these correct? I am thinking the answer is no. Neither is correct.

Good question.  The canonical (and I think preferred) correct form is

	geoSystem='"GD" "WE"'

or possibly, for readability (since whitespace between SFString entries in an MFString array are ignored),

	geoSystem=' "GD" "WE" '

I sometimes use the extra whitespace in examples so that people's eyes don't cross when looking at adjacent single/double quotes... they are typically hard to disambiguate visually.

Emphasizing terse/canonical form in specification wherever possible seems like the best approach.

> To see why, refer to the ISO/IEC specification 19776-1, 5.15 SFString and MFString <http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingOfFields.html#SFString>.
>
> The second sentence reads “SFString specifies a single string encoded as a sequence of UTF-8 octets enclosed in double quotes (e.g., *"string"*).” So I am agreed that the default value for an SFString field can be *“”*, an empty string.

yes.

this approach also matches default XML/HTML and all of the other non-MFString field encodings.  so now we are down to one exceptional case, MFString.

> The third sentence in 5.15 reads “The MFString specifies zero or more SFStrings enclosed in single quotes (e.g., '"string1" "string2"').” The first NOTE immediately following exemplifies this by saying *“string3”* is not a valid instance of an MFString. It is properly specified as *‘”string3”’*. It then goes on to give further examples. So, the /geoSystem/ field above is clearly wrong. It should read
>
>                  geoSystem=’”GD” “WE”’

Here is that note again with properly copied quoting:

http://www.web3d.org/documents/specifications/19776-1/V3.3/Part01/EncodingOfFields.html#SFString
=======================
NOTE  The construct

     "string3"

is not a valid instance of an MFString. Such an MFString would be properly specified as

     '"string3"'
=======================

This is an example of the insistence on strict quoting of SFString values within an MFString array.  Recent working-group email thread also refers to that:

[x3d] Status of PickableGroup node in Schema V3.3: quoting MFString defaults even if a single SFString value
http://web3d.org/mailman/private/x3d_web3d.org/2015-May/003069.html

I would prefer that we consider the two forms of string3 above to be consistent, i.e. MFString values without embedded quotes become allowed so that author's intent (which is quite clear) results in an MFString with a single SFString element, rather than an error.

sidebar: should we reopen this as an issue in the XML Encoding?  Am expecting it will probably be essential for proper alignment of X3D/X3DOM with HTML5.

Anyway, for the current work, we must remain completely strict of course and so the NOTE remains correct.

> Similarly, we conclude that the /url/ field above is also wrong. It should read
>
>                  url=’””’

*This is an MFString array that contains a single SFString element, which itself is an empty string.*

> But this says that the default gives the field one value which happens to be the empty string. In fact, the third sentence above stated “_zero_ or more SFStrings enclosed in single quotes”. So, if we want to specify the field to have no values the correct presentation is
>
>                  url=’’

*This is an empty MFString array.*

The two examples you present are different values.  I agree that the second form is correct for empty string array.

> I shall, therefore, modify the XSLT file to properly present the defaults for MFString values.

Hopefully that equals empty MFString value

	url=''

OK will look at those implementation details separately - thanks.

Precisely phrased summary of spec form: since multiple approaches are legally equivalent, we are presenting defaults for MFString values in preferred form.

Whew!  Hope this rationale is correctly explained and makes sense for you & everyone else too.  Feedback/questions/improvements always welcome.

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman



More information about the x3d-public mailing list