Discussions for Merging X3D AR Proposals
As described in Plans for Merging X3D AR Proposals, here we discuss and produce a merged proposal for each functional components by investigating each functional features stepwise.
Contents
1. Camera video stream image into the scene (texture and background)
Node structure
There are three options to choose from for designing the new node structure for supporting camera video stream in X3D scene.
Option 1. Describe sensors explicitly
- Define a node that represents the camera/image sensor, then route its output to other nodes (e.g. Pixel Texture node or a new Background node such as ImageBackground or MovieBackground)
All three proposals KC1, KC2 and IR support this model with slightly different details.
- Pros:
- Open for using it in other purposes in the future (more extensible)
- Cons:
- Relatively more complicated to write scenes and implement browsers
Option 2. Describe sensors Implicitly
- Define a node that represents "background" or "texture" that is dedicated to showing user media (either from a camera device or a user selected file.)
KC1 proposes this option as an alternative with simpler structure for browser implementation and scene writing.
- Pros:
- Simpler on content creators perspective
- Easier to implement and test since lesser interaction with other nodes
- Cons:
- Single purpose node, which might not be used much for other purposes
Option 3. Allowing both
- Pros:
- Letting user to choose the option that meets their needs
- Cons:
- Cost to implement both to browser developers
Selecting video source
- Reference: Adobe Flash and HTML5 getUserMedia() API
Scene writer doesn't know about the hardware setup on scene viewer, and accessing camera on the user's device could be an privacy issue. Both Adobe Flash and HTML5 deals this by asking the user to allow browser to use camera input. In addition, they also asks for which camera or video file to use.
2. Tracking (including support for general tracking devices)
Similar to selecting video source, tracking device configuration is unknown to the scene writer, hence it should be taken care by the browser on the user side. In that sense, X3D nodes should just provide an interface to receive tracking results, which is basically transform information.
In that sense, a special transform matrix could be defined, and when a browser detects this node, it should automatically map it to available tracker or ask user to choose which to use.
<TrackedTransform type="PositionAndOrientation" target="hand" />
URN classes could be developed to categorize the tracking targets (e.g. hand, head, viewpoint, etc.) to make it easier for users to identify which tracking devices one should choose.