[x3d-public] X3D file encoding considerations: UTF-8/16/32 etc.

Joe D Williams joedwil at earthlink.net
Fri May 13 12:46:52 PDT 2016


> 4. JSON allows UTF-8, 16 or 32 encodings, but does not allow Byte 
> Order Mark (BOM).

* Specify server delivery. Use either default (ANSI ASCII or UTF-8?) 
or server can be or may already be configured to send UTF-8 
with/without BOM.

* BOM may or may not stop processing, depends on tool.

* Tool may either use BOM noBOM or other inspection to determine bit 
encoding and use whatever it gets.

* Each UTF is reversible, thus every UTF supports lossless round 
tripping. There are no 'extra' or different standard characters in the 
16 or 32 encodings.

* Author to deliver with anything other than UTF-8 no BOM may be a 
problem or even an error but typically no problem if served as ANSI or 
ASCII (send problems to list because there should be no problems ever 
again).

* wherever characters are appearing, all X3D SF/MFString or any 
characters composing the human-readable user code will be treated as 
if the character encoding is UTF-8.  When the authored-defined 
characters to be displayed as text are not in the set defined in the 
tool then a substitute for the character is displayed.

* Use UTF-16, UTF-32 authortime at your own risk, at this time. 
because consumer applications may not recognize 16/32 or bother to do 
conversion to utf-8.

Thanks and Best,
Joe



----- Original Message ----- 
From: "Don Brutzman" <brutzman at nps.edu>
Cc: "X3D Graphics public mailing list" <x3d-public at web3d.org>
Sent: Thursday, May 12, 2016 9:25 AM
Subject: [x3d-public] X3D file encoding considerations: UTF-8/16/32 
etc.


> 1. Following up on a discussion in yesterday's weekly X3D working 
> group teleconference, this morning we are discussing whether an 
> @encoding attribute should be part of the X3D header in the JSON 
> encoding.  Investigation took us in multiple directions.
>
> Of note is that this analysis only appears to apply to files.  The 
> nature of how applications handle their in-memory data structures is 
> essentially a separate issue, once a scene document has been 
> properly loaded.
>
> ==================================================================
>
> 2.  X3D abstract specification doesn't explicitly address file 
> encodings.  (This seems appropriate.)
>
> 4.8 Data encodings
> http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/concepts.html#Dataencodings
>
>> The X3D run-time architecture is independent of the data encoding 
>> format. X3D content and applications can be authored in a variety 
>> of encodings, including textual (XML and Classic VRML encodings) 
>> and binary, either compressed or uncompressed. ISO/IEC 19775 
>> contains an abstract encoding specification that defines the 
>> structure of the X3D scene: hierarchical relationships among 
>> objects, initial values for objects, and dataflow connections 
>> between objects. All concrete data encodings for X3D shall conform 
>> to this abstract specification.
>>
>> Browsers and generators may support any or all of the standard 
>> encoding formats, depending on their application needs and the 
>> conformance requirements of a specific component or profile.
>>
>> X3D encodings are fully specified in the parts of ISO/IEC 19776.
>
> ==================================================================
>
> 3. Of note: XML version info is required, but rhw encoding 
> declaration is optional.
>
> Extensible Markup Language (XML) 1.0 (Fifth Edition) 2.8 Prolog and 
> Document Type Declaration
>
> 4.3 Parsed Entities
> https://www.w3.org/TR/REC-xml/#TextEntities
>
> 4.3.1 The Text Declaration
> https://www.w3.org/TR/REC-xml/#NT-TextDecl
>
>> Prolog
>> [22]   prolog    ::=   XMLDecl? Misc* (doctypedecl Misc*)?
>> [23]   XMLDecl    ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? 
>> S? '?>'
>
> Interestingly, later in the XML spec, it points out that the 
> declared encoding in a file might be overridden by HTTP or MIME 
> transport.
>
> 4.3.3 Character Encoding in Entities
> https://www.w3.org/TR/REC-xml/#charencoding
>
>> In the absence of information provided by an external transport 
>> protocol (e.g. HTTP or MIME), it is a fatal error for an entity 
>> including an encoding declaration to be presented to the XML 
>> processor in an encoding other than that named in the declaration, 
>> or for an entity which begins with neither a Byte Order Mark nor an 
>> encoding declaration to use an encoding other than UTF-8. Note that 
>> since ASCII is a subset of UTF-8, ordinary ASCII entities do not 
>> strictly need an encoding declaration.
>
> Next steps for X3D XML:
> - confirm and correct the corresponding sections for X3D XML and X3D 
> ClassicVRML encodings
> - make encoding optional in X3D DTD and Schema and guidance
>
> ==================================================================
>
> 4. JSON allows UTF-8, 16 or 32 encodings, but does not allow Byte 
> Order Mark (BOM).
>
> Background information and current status for X3D JSON encoding:
>  X3D to JSON Stylesheet Converter -  Design Correspondences
> http://www.web3d.org/x3d/stylesheets/X3dToJson.html#DesignCorrespondences
>
> "Encoding. JSON files are implicitly allowed to have an encoding of 
> UTF-8, UTF-16 or UTF-32. The "X3D" object is given a field such as 
> "encoding":"UTF-8" to indicate the scene author's expected encoding. 
> This expression of file encoding corresponds to the header statement 
> syntax required for the VRML97, ClassicVRML and XML file formats."
>
> The notion of preserving the scene author's expected encoding seems 
> mistaken, since downstream file transport and applications might 
> legitimately and correctly change the encoding.
>
> Minimum change: make the encoding attribute optional.
>
> Worth considering: we might follow the approach used by the JSON 
> specification, i.e. don't declare it, and since "it is what it is" 
> let the text processor deduce it.
>
> Additional JSON references:
>
> Standard ECMA-404, The JSON Data Interchange Format
> http://www.ecma-international.org/publications/standards/Ecma-404.htm
>
> Introduction
>> [...] It is expected that other standards will refer to this one, 
>> strictly adhering to the JSON text format, while imposing 
>> restrictions on various encoding details.
>
> 9 String
>> [...] To escape a code point that is not in the Basic Multilingual 
>> Plane, the character is represented as a twelve-character sequence, 
>> encoding the UTF-16 surrogate pair. So for example, a string 
>> containing only the G clef character (U+1D11E) may be represented 
>> as "\uD834\uDD1E".
>
> and
>
> RFC 7159, JavaScript Object Notation (JSON) Data Interchange Format
> 8.1.  Character Encoding
> https://tools.ietf.org/html/rfc7159#section-8.1
>
>>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The 
>> default
>>    encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
>>    interoperable in the sense that they will be read successfully 
>> by the
>>    maximum number of implementations; there are many 
>> implementations
>>    that cannot successfully read texts in other encodings (such as
>>    UTF-16 and UTF-32).
>>
>>    Implementations MUST NOT add a byte order mark to the beginning 
>> of a
>>    JSON text.  In the interests of interoperability, 
>> implementations
>>    that parse JSON texts MAY ignore the presence of a byte order 
>> mark
>>    rather than treating it as an error.
>
> ==================================================================
>
> 5. Next steps for X3D XML:
> - confirm and correct the corresponding sections for X3D XML and X3D 
> ClassicVRML encodings
> - make encoding optional in X3D DTD and Schema and guidance
>
> ==================================================================
>
> 6. Next steps for X3D working group members:
>
> Roy has created a Mantis project to begin recording design issues 
> for the X3D JSON Encoding.
>
> This complements the GitHub project for the draft 19776-3 JSON 
> Encoding specification.
>
> It is great that we might finally be approaching clarity and 
> consistency on a long-standing (and sometimes mysterious) issue.
>
> An acquired taste perhaps, but... Have fun with Web Standards!
>
> all the best, Don
> -- 
> Don Brutzman  Naval Postgraduate School, Code USW/Br 
> brutzman at nps.edu
> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA 
> +1.831.656.2149
> X3D graphics, virtual worlds, navy robotics 
> http://faculty.nps.edu/brutzman
>
> _______________________________________________
> x3d-public mailing list
> x3d-public at web3d.org
> http://web3d.org/mailman/listinfo/x3d-public_web3d.org 




More information about the x3d-public mailing list