[x3d-public] X3D file encoding considerations: UTF-8/16/32 etc.

Don Brutzman brutzman at nps.edu
Thu May 12 09:25:00 PDT 2016

1. Following up on a discussion in yesterday's weekly X3D working group teleconference, this morning we are discussing whether an @encoding attribute should be part of the X3D header in the JSON encoding.  Investigation took us in multiple directions.

Of note is that this analysis only appears to apply to files.  The nature of how applications handle their in-memory data structures is essentially a separate issue, once a scene document has been properly loaded.


2.  X3D abstract specification doesn't explicitly address file encodings.  (This seems appropriate.)

4.8 Data encodings

> The X3D run-time architecture is independent of the data encoding format. X3D content and applications can be authored in a variety of encodings, including textual (XML and Classic VRML encodings) and binary, either compressed or uncompressed. ISO/IEC 19775 contains an abstract encoding specification that defines the structure of the X3D scene: hierarchical relationships among objects, initial values for objects, and dataflow connections between objects. All concrete data encodings for X3D shall conform to this abstract specification.
> Browsers and generators may support any or all of the standard encoding formats, depending on their application needs and the conformance requirements of a specific component or profile.
> X3D encodings are fully specified in the parts of ISO/IEC 19776.


3. Of note: XML version info is required, but rhw encoding declaration is optional.

Extensible Markup Language (XML) 1.0 (Fifth Edition) 2.8 Prolog and Document Type Declaration

4.3 Parsed Entities

4.3.1 The Text Declaration

> Prolog
> [22]   	prolog	   ::=   	XMLDecl? Misc* (doctypedecl Misc*)?
> [23]   	XMLDecl	   ::=   	'<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'

Interestingly, later in the XML spec, it points out that the declared encoding in a file might be overridden by HTTP or MIME transport.

4.3.3 Character Encoding in Entities

> In the absence of information provided by an external transport protocol (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an entity which begins with neither a Byte Order Mark nor an encoding declaration to use an encoding other than UTF-8. Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration.

Next steps for X3D XML:
- confirm and correct the corresponding sections for X3D XML and X3D ClassicVRML encodings
- make encoding optional in X3D DTD and Schema and guidance


4. JSON allows UTF-8, 16 or 32 encodings, but does not allow Byte Order Mark (BOM).

Background information and current status for X3D JSON encoding:
X3D to JSON Stylesheet Converter -  Design Correspondences

"Encoding. JSON files are implicitly allowed to have an encoding of UTF-8, UTF-16 or UTF-32. The "X3D" object is given a field such as "encoding":"UTF-8" to indicate the scene author's expected encoding. This expression of file encoding corresponds to the header statement syntax required for the VRML97, ClassicVRML and XML file formats."

The notion of preserving the scene author's expected encoding seems mistaken, since downstream file transport and applications might legitimately and correctly change the encoding.

Minimum change: make the encoding attribute optional.

Worth considering: we might follow the approach used by the JSON specification, i.e. don't declare it, and since "it is what it is" let the text processor deduce it.

Additional JSON references:

Standard ECMA-404, The JSON Data Interchange Format

> [...] It is expected that other standards will refer to this one, strictly adhering to the JSON text format, while imposing restrictions on various encoding details.

9 String
> [...] To escape a code point that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So for example, a string containing only the G clef character (U+1D11E) may be represented as "\uD834\uDD1E".


RFC 7159, JavaScript Object Notation (JSON) Data Interchange Format
8.1.  Character Encoding

>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
>    encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
>    interoperable in the sense that they will be read successfully by the
>    maximum number of implementations; there are many implementations
>    that cannot successfully read texts in other encodings (such as
>    UTF-16 and UTF-32).
>    Implementations MUST NOT add a byte order mark to the beginning of a
>    JSON text.  In the interests of interoperability, implementations
>    that parse JSON texts MAY ignore the presence of a byte order mark
>    rather than treating it as an error.


5. Next steps for X3D XML:
- confirm and correct the corresponding sections for X3D XML and X3D ClassicVRML encodings
- make encoding optional in X3D DTD and Schema and guidance


6. Next steps for X3D working group members:

Roy has created a Mantis project to begin recording design issues for the X3D JSON Encoding.

This complements the GitHub project for the draft 19776-3 JSON Encoding specification.

It is great that we might finally be approaching clarity and consistency on a long-standing (and sometimes mysterious) issue.

An acquired taste perhaps, but... Have fun with Web Standards!

all the best, Don
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman

More information about the x3d-public mailing list