AR Proposal Public Review
-Working draft
By Augmented Reality Working Group, Web3D Consortium
Feb 22, 2013
The Augmented Reality Continuum Working Group (ARC WG) has been developing a proposal for extending X3D to support augmented and mixed reality visualization. As the proposal is reaching the state of its completion, the working group has decided to open the proposal into public and collect feedbacks from others, including our Web3D members, other working groups, and generally from anyone who is interested in AR and Web3D technology. The ARC WG would like to welcome all kinds of feedback that would be helpful to consolidate the proposal and advance into next level of extending the X3D specification to support AR and MR visualization.
- Reviewing period: Feb 27 ~ Apr 10, 2013
- How to give feedback:
- Use the "discussion" tab on the top of this page to give feedback and start discussions.
- If you prefer e-mails, please mail your feedback to Gun Lee (ARC WG co-chair, endovert[at]postech.ac.kr)
Extending X3D for MR Visualization - Unified Proposal
1. Introduction
This document describes an overview of the unified proposal for extending the X3D standard to support Mixed Reality (MR) visualization. Mixed Reality includes both, Augmented Reality (AR) and Augmented Virtuality (AV). The extension of the X3D standard proposed in this document is based on the comparison of three proposals: two from Web3D Korea Chapter (KC1 and KC2) and one from InstantReality (IR), Fraunhofer IGD. The details of the comparison can be found in the following public wiki page: http://web3d.org/x3d/wiki/index.php/Comparison_of_X3D_AR_Proposals.
In this document we focus on the three main components that are necessary to achieve basic MR visualization: sensors, video stream rendering, and camera calibration. We try to minimize the changes to the current specification, but also try to make the solution to be generic enough so that it could be applied to various future applications besides MR visualization.
In order to focus on consolidating the fundamental features, we leave out the following items/functions from the original proposals as future work.
- High-level events for tracking from proposal KC2
- Supporting color keying in texture from proposal KC1
- Supporting correct occlusion between virtual and physical objects (Ghost object from Proposal KC1 and Color Mask + sortKey from IR)
- Supporting generic type of sensors including those are not directly related to AR/MR visualization (Direct sensor nodes in IR)
2. Sensors
To achieve MR visualization, sensing the real environment is crucial. Two types of sensor that are necessary to support MR visualization are those for acquiring video stream images from a real camera and motion tracking information of physical objects. While the sensors could be generalized to acquiring any type of information from the real world, in this proposal, we focus on these two sensors that are crucial for MR visualization.
Two new nodes, CalibratedCameraSensor and TrackingSensor nodes, are proposed for representing interfaces for these types of sensors.
Since different end users view the X3D scene on different browsers and have different setups of hardware and software, it is not appropriate to describe specific devices or tracking technology to use within the scene. In fact, the author of the X3D scene can have no knowledge of what kind of hardware or software setup is available on the user’s side. Therefore, the X3D scene should only include the high-level purpose of the sensor to give hint to the browser (or the user) to choose appropriate hardware or software on the user’s setup that could meet the intended use. The “description” field is used to describe such intention of using the sensor. At run-time, the browser will show the value of the "description" field to the user in a dialog box, asking to choose an appropriate one from the list of sensors available in the local hardware/software setup. The user chooses the appropriate hardware to use for the sensor node, and in this way, users can view the X3D scene with the best option of hardware/software sensors available in his/her environment. Asking the user to choose the sensor also provides a method for validating the use of sensors on the user's device to overcome privacy issues. In addition, browsers can have options configured to use the sensors that was chosen by the user in previous instance of running the scene.
2.1 CalibratedCameraSensor node
The CalibratedCameraSensor node provides an interface to a camera device. The main information provided into the X3D scene through this node is an image stream captured with the camera. The ‘image’ field of the node provides the image stream captured with the camera device. In addition to the image stream, the node should also provide internal parameters of the camera for calibration of the Viewpoint parameters to achieve correct composition of the MR scene. Four fields (focalPoint, fieldOfView, fovMode, and aspectRatio) provides such parameters that corresponds to those fields used in the Viewpoint node. The detailed description of each field is in section 3 where the Viewpoint node is described.
CalibratedCameraSensor : X3DSensorNode { SFBool [in,out] enabled TRUE SFNode [in,out] metadata NULL [X3DMetadataObject] SFBool [out] isActive SFString [in,out] description "" SFImage [out] image SFVec2f [out] focalPoint SFFloat [out] fieldOfView SFString [out] fovMode SFFloat [out] aspectRatio SFBool [in, out] isActive FALSE }
When there are more than one camera device available, the browser should ask the user to choose which camera to use for which node, interactively through the user interface (e.g. a dialog box).
2.2 TrackingSensor node
The TrackingSensor node provides an interface for motion tracking information. The main information provided in this node is position and orientation of the tracked physical object. These values are provided through ‘position’ and ‘rotation’ fields respectively. The ‘isPositionAvailable’ and ‘isRotationAvailable’ fields are TRUE if the tracking target is successfully tracked and the values of the ‘position’ or ‘rotation’ field is valid.
TrackingSensor : X3DSensorNode { SFVec3f [out] position SFRotation [out] rotation SFBool [out] isPositionAvailable FALSE SFBool [out] isRotationAvailable FALSE SFBool [in, out] isActive FALSE MFString [in,out] purpose }
The “purpose” string field defines the intended use of the tracking sensor.
TABLE HERE
3. Rendering video stream from camera
To visualize a MR scene, the video stream image acquired from a sensor node should be rendered in the virtual scene. For AR visualization, the video stream should be rendered as a background of the virtual environment, while in the AV visualization, the video stream is used as a texture of a virtual object.
3.1 Using video stream as a Texture
For using the video stream image as a texture, no extension of the standard is needed. We can use the PixelTexture node, which is already available in the current version of the X3D specification. The video stream image from the SFImageSensor node’s “value” field can be routed to the “image” field of the PixelTexture node. The following example shows how this routing works.
<CalibratedCameraSensor DEF=”camera” /> ... <PixelTexture DEF=”tex” /> ... <ROUTE fromNode='camera' fromField='image' toNode='tex' toField='image'/>
3.2 Using video stream as a Background
The Background nodes in the current X3D specification cover only environmental backgrounds in 3D space. Both Background and TextureBackground describes an environment around the user’s viewpoint, represented as a colored sphere around the user or a textured cube. In both cases the background of the virtual scene gets updated depending on the viewing direction of the user. However, for AR visualization, the background of the virtual scene should always show the video stream from the camera sensor. While the Background node and TextureBackground node represent a three dimensional environmental background around the user, the AR background should work as a two dimensional backdrop of the viewport where the 3D scene is rendered on. For this purpose we need a new node type that could represent these kinds of background that work as a 2D backdrop of the scene. We proposed two new nodes for this purpose: BackdropBackground and ImageBackdropBackground nodes. The node structure of these nodes are described as the following:
BackdropBackground: X3DBackgroundNode { SFColor [in,out] color MFString [in,out] url } ImageBackdropBackground: X3DBackgroundNode { SFColor [in,out] color SFImage [in,out] image }
While only ImageBackdropBackground is necessary for AR application, we also define BackdropBackground node as a higher level node that corresponds to the Background node. Again, feeding the video stream image from the camera sensor to the ImageBackdropBackround node can be achieved by routing the ‘image’ field of the CalibratedCameraSensor node to the ‘image’ field of the ImageBackdropBackground node.
<CalibratedCameraSensor DEF=”camera” /> ... <ImageBackdropBackground DEF=”bg” /> ... <ROUTE fromNode='camera' fromField='image' toNode=bg toField='image'/>
The ImageBackdropBackround will automatically scale the image and fit the width or height of the image to that of the viewport while retaining the aspect ratio. As a result, the background image will fill the entire viewport so that there are no blank region left uncovered by the image background.
4. Camera calibration
To assure the virtual world appears correctly registered to the real world in the MR scene, the camera parameters of the virtual camera should be calibrated to match those of the real camera. There are two types of camera parameters: internal and external parameters. External parameters are position and orientation of the camera in the world reference frame, while the internal parameters represent the projection of the 3D scene onto a 2D plane to produce a rendered image of the 3D scene. The external parameters of a real camera is measured with tracking sensors, while the internal parameters are defined from the optical features of the real camera. Both external and internal parameters of the real camera can be fed into the X3D scene through the CalibratedCameraSensor node defined in section 1. The Viewpoint node in the X3D specification represents a virtual camera in the virtual scene. While the fields (or properties) of the Viewpoint node cover the full set of external parameters (position and orientation), it only has fields that cover limited aspects of the internal parameters. To meet the minimum requirements for achieving MR visualization, we propose adding the following fields (in bold font) to the Viewpoint node.
Viewpoint: X3DViewpointNode { SFVec3f [in,out] centerOfRotation SFFloat [in,out] fieldOfView SFRotation [in,out] orientation SFVec3f [in,out] position SFString [in,out] fovMode SFFloat [in,out] aspectRatio }
In the current X3D specification, the “fieldOfView” field represents minimum field of view (either vertical or horizontal) that the virtual camera will have. This is insufficient for MR visualization, which needs precise calibration of the field of view (FOV) parameter. While the straightforward way would be explicitly having both horizontal and vertical FOV parameters as individual fields, this is not compatible with the current specification. In order to keep backward compatibility with the current specification, we propose having a “fovMode” field which designates what does the value of the “fieldOfView” field represent. The “fovMode” field can have one of the following values: MINIMUM, VERTICAL, HORIZONTAL, or DIAGONAL. The value MINIMUM is the default value for the “fovMode” field which represents the value of the “fieldOfView” is considered as a minimum FOV (either vertical or horizontal), as it is in the current specification. When the “fovMode” field has the value of VERTICAL, HORIZONTAL, or DIAGONAL, the “fieldOfView” is considered as specific values of FOV in vertical, horizontal, or diagonal direction, respectively. In addition to the “fovMode” field, the aspect ratio of the FOV in real cameras might not necessarily follow the aspect ratio of the image size it produces. To accommodate this feature, the “aspectRatio” field is introduced which represents the ratio of vertical FOV to the horizontal FOV (vertical/horizontal).
5. Use cases
The following example X3D scene shows how to describe a simple AR scene using the proposed nodes.
... <CalibratedCameraSensor DEF=”camera” /> <ImageBackdropBackground DEF=”bg” /> <ROUTE fromNode=”camera” fromField=”value” toNode=”bg” toField=”image”/> <Viewpoint DEF=”arview” position=”0 0 0” /> <ROUTE fromNode=”camera” fromField=”fieldOfView” toNode=”arview” toField=”fieldOfView”/> <ROUTE fromNode=”camera” fromField=”fovMode” toNode=”arview” toField=”fovMode”/> <ROUTE fromNode=”camera” fromField=”aspectRatio” toNode=”arview” toField=”aspectRatio”/> <TrackingSensor DEF=”tracker1” purpose=”urn:web3d:tracking_sensor_purpose:object_to_viewpoint” /> <Transform DEF=”tracked_object”> <Shape> <Appearance><Material diffuseColor="1 0 0" /></Appearance> <Box /> </Shape> </Transform> <ROUTE fromNode=”tracker1” fromField=”position” toNode=”tracked_object” toField=”position”/> <ROUTE fromNode=”tracker1” fromField=”rotation” toNode=”tracked_object” toField=”rotation”/> …