[X3D-Public] Fwd: Re: [X3D] X3D HTML5 meeting discussions:Declarative 3D interest group at W3C

Sun Jan 2 02:58:54 PST 2011

Hi,

A few comments:

The "human readable encoding" is a bit more than that particularly for
X3D, XML3D or the DOM. Its not just a static model of the 2D/3D scene
but really a full runtime with very detailed semantics. This is also why
you cannot compare these models to Collada (static model including
predefined animations). Its not just about rendering of some models.

Having said this, even the encoding is not as simple as one might think.
My pet example is the IndexedFaceSet: It has many, many options of how
the available data can be mapped to something that can get rendered
(say, OGL vertex buffers). But since the API gives you full access to
the source data, you in general have to keep a copy of the original data
around in user space -- it cannot be reconstructed from the data in the
vertex buffers.

In XML3D we have chosen to stay as close to HW representations as
possible (just as WebGL does by necessity). And so could get away with
just having a single copy of the data on the GPU only (this might not
always be the best option performance wise but at least it is possible
to do so).

Regarding the "Magic" I believe that we have to distinguish between
data-intensive computing and event processing (in X3D they are mostly
the same implemented via Routes). I have commented on the event
processing part a bit, earlier today. So let me comment a bit on the
data-intensive parts:

A lot of this is stuff that you do not really want to do on the CPU or
at least you want to have the option to download this processing to some
more powerful processors. These are typically the GPU (specifically on
mobile devices). Here we run into the same dilemma as with programmable
shaders: There are simply too many variants for programming these
devices and there most likely will be many more coming.

So this is an ideal candidate for a declarative approach instead of
having to specify exactly which code need to get executed, we only
specify the semantics. This is exactly what XFlow does at the moment.
Note, that the <data> model from XML3D is the perfect interface for
XFlow while the two are kept completely separate from each other. Also
note that the <data> model (which is also the representation of all
rendered geometry) maps extremely well to typed arrays from WebGL. This
is by design!

For future versions, we have designed XFlow to also be extendable with
scripts and we are developing compiler technology that can take a common
high-level description for these special "scripts" and generate output a
variety of formats including CUDA, OpenCL, GLSL, or others. The OpenCL
and GLSL backend are already working in prototypes, by the way.

So to put these things together. Here is a high-level overview of how
XML3D works: We maintain the DOM elements as rather light weight
wrappers (ideally none of the 3D data should be stored here but some
browser (namely WebKit) currently insist on maintaining an text copy of
it -- we see this as a temporary implementation issue that will get
resolved in time). The wrappers forward all the data to a 3D scene graph
running within the browser. There renderers are watching the graph for
changes (again using a simple callback scheme) and maintain their
optimized version for rendering (e.g. sorted by shader instances for
rasterization, maintaining a spatial index structure for ray tracing, or
such). Ideally, the 3D data is directly forwarded to the GPU and only
maintained there.

XFlow ties nicely into this. Its declarative description of the
data-processing allows us to perform the processing also on the GPU. It
maintains a dependency graph of the processing to be performed if some
data in the graph changes, which will eventually update the vertex
arrays that will be used during rendering. Yes, this is similar to
Routes but used only for declarative, platform-independent,
data-intensive processing. It does not tie into any event processing.

Note, that all of this can also work simply at the CPU layer (which it
does in the prototype right now). As a next step we can just as easily
replace (incrementally) some of the CPU implementations with CUDA/OpenCL
or such. We have done something very similarly already in a previous
system, which was published last year. So we know that this is all
possible and works with very little overhead. Note that this nicely
protects the data also from rogue stuff on the Internet.

Hope this helps giving you the broader picture of how things work
together. As I said before it does not include the event processing that
you seem to be looking for, though. Maybe we can discuss the need  (or
non-need) for this separately.

	Philipp

Am 01.01.2011 04:27, schrieb Joe D Williams:
>> Let me continue politely answering technical questions and comments on
>> this list as long people raise them..
>>
> 
> well, i shouln't have thrown proto in there, but the thing is, whatever,
> we need a highly intreractive DOM for the HTML crew, at least until the
> 3D can really be sufficently embedded that we don't have to call it a
> plugin anymore.
> That is a basic idea, right. Since interactive 3D processing can take
> all resources there are, and since there is a natural proven access
> model for the 3D visualization, then we know we need another 'engine'
> down there to provide highest performance processing for the 3D object,
> mostly because it is made up of some hierarchy of other 3D objects.
> 
> anyway, with WebGL, it doesn't get complicated until there is
> interaction with whatever data and event tree/graph.
> like if there is a webgl scipt that modifies the data structures. If you
> have ten things changing then you start to need the majik 'plugin' that
> does the data intensive stuff in some knowable order.
> That, my friend is the 'engine' of the 3D visualization device of
> interest here.
> What 'engine' should be picked?
> Does it depend upon the encoding of the input data?
> Yes but to no great extent since most all is really all the same data,
> there is usually a simple front end to transcode to whatever you wish -
> if there is a common vocabulary - most all the numbers are the same. So
> whether the vertices are descibed in X3D, Collada, or maybe even a more
> modern and improved syntax, no matter, all it has to do is produce a DOM
> with nodes (elements) accessible using DOM techniques.
> So then what I might call the render (data) graph (DOM Tree/X3D DAG) is
> fairly simple and known with several open variations to choose from,
> and so it is the behaviour graph - or in X3D the event graph - where it
> starts to get really interesting. Sure,the processing you do to get the
> rendering is all very fine and becoming very advanced with many new
> hardware accelerated techniques and like special accelerators for
> fantastically neat (possibly enforceably platform dependant) stuff like
> raytracing
> 
> Maybe gog angle or something like that is getting some hints that may
> pass down into the typical html browser. until then there is the need
> for a special piece of software that is optimized for processing the 3D
> data structure and enlivening the 3D event structure.
> 
> Thus, in this case, if all is open for discussion, the best place to
> start the discussion is describing details of both the DOM structure and
> (if different) the rendering data structure organization(s) and then
> showing the sequence of processing used to prepare the data structures
> for producing the next frame. Please know that is is natural to expect
> some future version will incorporate 3D or nD physics for html so we
> need to consider how to incorprate that possiblity into this event graph
> processing.
> 
> 
> To sumarize, then I will keep looking at any alternative encodings
> presented and give some opinion since it is so easy to get way off in
> the human-readable encoding part, but really not much care unless really
> silly and not informed, but that part is minor
> Reltive to the need for clear hifi defintions for the actual processing
> and updating of all those numbers.
> 
> So then, we are down to choosing an engine to run this so called
> htmlized thus webified version. Assuming this engine is aimed at fairly
> real time interactions with some complex enough to be interesting GUI
> and scenic features, like not just to produce a photorealitic rendering
> of some spinning tennis shoe but a complelling and somwhat immersive
> multimodal interaction, then the 'first step in defining the 'engine' is
> figure out how it must process events. That is, what does it do when
> some external requests a change, and what happens when some internal
> requests some change?
> Anyway, that is why I thought I might get a laugh when I said the first
> thing an X3D engine does for each new frame is to check then move the
> viewpoint,  then work to figure out the result. (Heck, I haven't looked
> to see if there is even the concept of a viewpoint or animated camera in
> this new stuff.)
> 
>> It is certainly not our intension
>> to use the X3D mailing list for the discussions of the future W3C XG. We
>> just asked for our input and comments.
>>
> 
> oh, sorry if any of the above is off topic:) just rambling on this
> newyear's evening.
> 
>> I agree that prototypes are a valuable feature that do not exist in this
>> form in HTML.
> 
> well that was wrong of me to use that where I did. I hope you see the
> connection and complication with DEF which is another thing html does
> not do but html tried that with a Declare, but it never caught on.
> 
> The important thing is that right now, in html, X3D is playing in the
> sandbox as a distinct context. Maybe here we can be a more ouside the
> sandbox and a more accessible distinct context.
> 
> 
>> However, on the HTML side there is something called XBL
> 
> OK, I will check that.
> 
> summary:
> * data syntax not that important,
> * data structure (transform hierarchy) is important.
> * show event system frame to frame processing steps
> 
> 
> Thanks and Best Regards,
> Joe
> 
> 
> 
>