[x3d-public] X3D file encoding considerations: UTF-8/16/32 etc.

Joe D Williams joedwil at earthlink.net
Fri May 13 14:32:49 PDT 2016


The UTF standard says 100percent lossless transformation beween 8, 16, 
and 32. I don't think there are characters in 10646 that are not 
covered in all three forms. The 16 and 32 forms may be convenient for 
internal processing but will output to the web in 8. I don't think 
there is any problem as long as you output in utf-8 no BOM.
All will be fine, I think python just switched from only ascii so they 
probably have transcodings correct.
Good Luck,
Joe

.
----- Original Message ----- 
From: "Vincent Marchetti" <vmarchetti at ameritech.net>
To: "Joe D Williams" <joedwil at earthlink.net>; "X3D Graphics public 
mailing list" <x3d-public at web3d.org>
Sent: Friday, May 13, 2016 1:26 PM
Subject: Re: [x3d-public] X3D file encoding considerations: 
UTF-8/16/32 etc.



> On May 13, 2016, at 3:46 PM, Joe D Williams <joedwil at earthlink.net> 
> wrote:
>
>> 4. JSON allows UTF-8, 16 or 32 encodings, but does not allow Byte 
>> Order Mark (BOM).
>

> * wherever characters are appearing, all X3D SF/MFString or any 
> characters composing the human-readable user code will be treated as 
> if the character encoding is UTF-8.  When the authored-defined 
> characters to be displayed as text are not in the set defined in the 
> tool then a substitute for the character is displayed.
>

What are the consequences of using the Python standard json parser, 
which parses json strings into unicode data type values. Python 
internally represents Unicode code points as 2 byte or 4 byte values, 
depending on a compile time switch, and I would need to use the 4 byte 
option to allow all the characters in the ISO 10646 character set. I 
understand that when I transmit my X3D content to file or out as a 
string of bytes I need to use an encoding, probably UTF-8, that my 
receiving system is expecting; but are there any consequences to using 
an internal representation of SFString which is not UTF-8?

> * Use UTF-16, UTF-32 authortime at your own risk, at this time. 
> because consumer applications may not recognize 16/32 or bother to 
> do conversion to utf-8.
>
> Thanks and Best,
> Joe
>
>
>
> ----- Original Message ----- From: "Don Brutzman" <brutzman at nps.edu>
> Cc: "X3D Graphics public mailing list" <x3d-public at web3d.org>
> Sent: Thursday, May 12, 2016 9:25 AM
> Subject: [x3d-public] X3D file encoding considerations: UTF-8/16/32 
> etc.
>
>
>> 1. Following up on a discussion in yesterday's weekly X3D working 
>> group teleconference, this morning we are discussing whether an 
>> @encoding attribute should be part of the X3D header in the JSON 
>> encoding.  Investigation took us in multiple directions.
>>
>> Of note is that this analysis only appears to apply to files.  The 
>> nature of how applications handle their in-memory data structures 
>> is essentially a separate issue, once a scene document has been 
>> properly loaded.
>>
>> ==================================================================
>>
>> 2.  X3D abstract specification doesn't explicitly address file 
>> encodings.  (This seems appropriate.)
>>
>> 4.8 Data encodings
>> http://www.web3d.org/documents/specifications/19775-1/V3.3/Part01/concepts.html#Dataencodings
>>
>>> The X3D run-time architecture is independent of the data encoding 
>>> format. X3D content and applications can be authored in a variety 
>>> of encodings, including textual (XML and Classic VRML encodings) 
>>> and binary, either compressed or uncompressed. ISO/IEC 19775 
>>> contains an abstract encoding specification that defines the 
>>> structure of the X3D scene: hierarchical relationships among 
>>> objects, initial values for objects, and dataflow connections 
>>> between objects. All concrete data encodings for X3D shall conform 
>>> to this abstract specification.
>>>
>>> Browsers and generators may support any or all of the standard 
>>> encoding formats, depending on their application needs and the 
>>> conformance requirements of a specific component or profile.
>>>
>>> X3D encodings are fully specified in the parts of ISO/IEC 19776.
>>
>> ==================================================================
>>
>> 3. Of note: XML version info is required, but rhw encoding 
>> declaration is optional.
>>
>> Extensible Markup Language (XML) 1.0 (Fifth Edition) 2.8 Prolog and 
>> Document Type Declaration
>>
>> 4.3 Parsed Entities
>> https://www.w3.org/TR/REC-xml/#TextEntities
>>
>> 4.3.1 The Text Declaration
>> https://www.w3.org/TR/REC-xml/#NT-TextDecl
>>
>>> Prolog
>>> [22]   prolog    ::=   XMLDecl? Misc* (doctypedecl Misc*)?
>>> [23]   XMLDecl    ::=   '<?xml' VersionInfo EncodingDecl? SDDecl? 
>>> S? '?>'
>>
>> Interestingly, later in the XML spec, it points out that the 
>> declared encoding in a file might be overridden by HTTP or MIME 
>> transport.
>>
>> 4.3.3 Character Encoding in Entities
>> https://www.w3.org/TR/REC-xml/#charencoding
>>
>>> In the absence of information provided by an external transport 
>>> protocol (e.g. HTTP or MIME), it is a fatal error for an entity 
>>> including an encoding declaration to be presented to the XML 
>>> processor in an encoding other than that named in the declaration, 
>>> or for an entity which begins with neither a Byte Order Mark nor 
>>> an encoding declaration to use an encoding other than UTF-8. Note 
>>> that since ASCII is a subset of UTF-8, ordinary ASCII entities do 
>>> not strictly need an encoding declaration.
>>
>> Next steps for X3D XML:
>> - confirm and correct the corresponding sections for X3D XML and 
>> X3D ClassicVRML encodings
>> - make encoding optional in X3D DTD and Schema and guidance
>>
>> ==================================================================
>>
>> 4. JSON allows UTF-8, 16 or 32 encodings, but does not allow Byte 
>> Order Mark (BOM).
>>
>> Background information and current status for X3D JSON encoding:
>> X3D to JSON Stylesheet Converter -  Design Correspondences
>> http://www.web3d.org/x3d/stylesheets/X3dToJson.html#DesignCorrespondences
>>
>> "Encoding. JSON files are implicitly allowed to have an encoding of 
>> UTF-8, UTF-16 or UTF-32. The "X3D" object is given a field such as 
>> "encoding":"UTF-8" to indicate the scene author's expected 
>> encoding. This expression of file encoding corresponds to the 
>> header statement syntax required for the VRML97, ClassicVRML and 
>> XML file formats."
>>
>> The notion of preserving the scene author's expected encoding seems 
>> mistaken, since downstream file transport and applications might 
>> legitimately and correctly change the encoding.
>>
>> Minimum change: make the encoding attribute optional.
>>
>> Worth considering: we might follow the approach used by the JSON 
>> specification, i.e. don't declare it, and since "it is what it is" 
>> let the text processor deduce it.
>>
>> Additional JSON references:
>>
>> Standard ECMA-404, The JSON Data Interchange Format
>> http://www.ecma-international.org/publications/standards/Ecma-404.htm
>>
>> Introduction
>>> [...] It is expected that other standards will refer to this one, 
>>> strictly adhering to the JSON text format, while imposing 
>>> restrictions on various encoding details.
>>
>> 9 String
>>> [...] To escape a code point that is not in the Basic Multilingual 
>>> Plane, the character is represented as a twelve-character 
>>> sequence, encoding the UTF-16 surrogate pair. So for example, a 
>>> string containing only the G clef character (U+1D11E) may be 
>>> represented as "\uD834\uDD1E".
>>
>> and
>>
>> RFC 7159, JavaScript Object Notation (JSON) Data Interchange Format
>> 8.1.  Character Encoding
>> https://tools.ietf.org/html/rfc7159#section-8.1
>>
>>>   JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The 
>>> default
>>>   encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
>>>   interoperable in the sense that they will be read successfully 
>>> by the
>>>   maximum number of implementations; there are many 
>>> implementations
>>>   that cannot successfully read texts in other encodings (such as
>>>   UTF-16 and UTF-32).
>>>
>>>   Implementations MUST NOT add a byte order mark to the beginning 
>>> of a
>>>   JSON text.  In the interests of interoperability, 
>>> implementations
>>>   that parse JSON texts MAY ignore the presence of a byte order 
>>> mark
>>>   rather than treating it as an error.
>>
>> ==================================================================
>>
>> 5. Next steps for X3D XML:
>> - confirm and correct the corresponding sections for X3D XML and 
>> X3D ClassicVRML encodings
>> - make encoding optional in X3D DTD and Schema and guidance
>>
>> ==================================================================
>>
>> 6. Next steps for X3D working group members:
>>
>> Roy has created a Mantis project to begin recording design issues 
>> for the X3D JSON Encoding.
>>
>> This complements the GitHub project for the draft 19776-3 JSON 
>> Encoding specification.
>>
>> It is great that we might finally be approaching clarity and 
>> consistency on a long-standing (and sometimes mysterious) issue.
>>
>> An acquired taste perhaps, but... Have fun with Web Standards!
>>
>> all the best, Don
>> -- 
>> Don Brutzman  Naval Postgraduate School, Code USW/Br 
>> brutzman at nps.edu
>> Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA 
>> +1.831.656.2149
>> X3D graphics, virtual worlds, navy robotics 
>> http://faculty.nps.edu/brutzman
>>
>> _______________________________________________
>> x3d-public mailing list
>> x3d-public at web3d.org
>> http://web3d.org/mailman/listinfo/x3d-public_web3d.org
>
>
> _______________________________________________
> x3d-public mailing list
> x3d-public at web3d.org
> http://web3d.org/mailman/listinfo/x3d-public_web3d.org

Vincent Marchetti
vmarchetti at ameritech.net






More information about the x3d-public mailing list