Extensible 3D (X3D)
Part 1: Architecture and base components

16 Sound component

--- X3D separator bar ---

cube 16.1 Introduction

16.1.1 Name

The name of this component is "Sound". This name shall be used when referring to this component in the COMPONENT statement (see 7.2.5.4 Component statement).

16.1.2 Overview

This clause describes the Sound component of this part of ISO/IEC 19775. This includes how sound is delivered to an X3D world as well as how sounds are accessed. Table 16.1 provides links to the major topics in this clause.

Table 16.1 — Topics

cube 16.2 Concepts

16.2.1 Sound priority

If the browser does not have the resources to play all of the currently active sounds, it is recommended that the browser sort the active sounds into an ordered list using the following sort keys in the order specified:

  1. decreasing priority;
  2. for sounds with priority > 0.5, increasing (now-startTime);
  3. decreasing intensity at viewer location (intensity × "intensity attenuation");

where priority is the priority field of the Sound node, now represents the current time, startTime is the startTime field of the audio source node specified in the source field, and "intensity attenuation" refers to the intensity multiplier derived from the linear decibel attenuation ramp between inner and outer ellipsoids.

It is important that sort key 2 be used for the high priority (event and cue) sounds so that new cues will be heard even when the browser is "full" of currently active high priority sounds. Sort key 2 should not be used for normal priority sounds, so selection among them will be based on sort key 3 (intensity at the location of the viewer).

The browser shall play as many sounds from the beginning of this sorted list as it can given available resources and allowable latency between rendering. On most systems, the resources available for MIDI streams are different from those for playing sampled sounds, thus it may be beneficial to maintain a separate list to handle MIDI data.

16.2.2 Sound attenuation and spatialization

In order to create a linear decrease in loudness as the viewer moves from the inner to the outer ellipsoid of the sound, the attenuation must be based on a linear decibel ramp. To make the falloff consistent across browsers, the decibel ramp is to vary from 0 dB at the minimum ellipsoid to -20 dB at the outer ellipsoid. Sound nodes with an outer ellipsoid that is ten times larger than the minimum will display the inverse square intensity drop-off that approximates sound attenuation in an anechoic environment.

Browsers may support spatial localization of sounds whose spatialize field is TRUE as well as their underlying sound libraries will allow. Browsers shall at least support stereo panning of non-MIDI sounds based on the angle between the viewer and the source. This angle is obtained by projecting the Sound location (in global space) onto the XZ plane of the viewer. Determine the angle between the Z-axis and the vector from the viewer to the transformed location, and assign a pan value in the range [0.0, 1.0] as depicted in Figure 16.1. Given this pan value, left and right channel levels can be obtained using the following equations:

    leftPanFactor  = 1 - pan2
    rightPanFactor = 1 - (1 - pan)2

Stereo Panning

Figure 16.1 — Stereo panning

Using this technique, the loudness of the sound is modified by the intensity field value, then distance attenuation to obtain the unspatialized audio output. The values in the unspatialized audio output are then scaled by leftPanFactor and rightPanFactor to determine the final left and right output signals. The use of more sophisticated localization techniques is encouraged, but not required (see [SNDB]).

cube16.3 Abstract types

16.3.1 X3DSoundNode

X3DSoundNode : X3DChildNode { 
  SFNode [in,out] metadata NULL [X3DMetadataObject]
}

This abstract node type is the base for all sound nodes.

16.3.2 X3DSoundSourceNode

X3DSoundSourceNode : X3DTimeDependentNode { 
  SFString [in,out] description      ""
  SFBool   [in,out] loop             FALSE
  SFNode   [in,out] metadata         NULL [X3DMetadataObject]
  SFTime   [in,out] pauseTime        0    (-∞,∞)
  SFFloat  [in,out] pitch            1.0  (0,∞)
  SFTime   [in,out] resumeTime       0    (-∞,∞)
  SFTime   [in,out] startTime        0    (-∞,∞)
  SFTime   [in,out] stopTime         0    (-∞,∞)
  SFTime   [out]    duration_changed
  SFTime   [out]    elapsedTime
  SFBool   [out]    isActive
  SFBool   [out]    isPaused
}

This abstract node type is used to derive node types that can emit audio data.

cube 16.4 Node reference

16.4.1 AudioClip

AudioClip : X3DSoundSourceNode, X3DUrlObject {
  SFString [in,out] description      ""
  SFBool   [in,out] loop             FALSE
  SFNode   [in,out] metadata         NULL  [X3DMetadataObject]
  SFTime   [in,out] pauseTime        0     (-∞,∞)
  SFFloat  [in,out] pitch            1.0   (0,∞)
  SFTime   [in,out] resumeTime       0     (-∞,∞)
  SFTime   [in,out] startTime        0     (-∞,∞)
  SFTime   [in,out] stopTime         0     (-∞,∞)
  MFString [in,out] url              []    [URI]
  SFTime   [out]    duration_changed
  SFTime   [out]    elapsedTime
  SFBool   [out]    isActive
  SFBool   [out]    isPaused
}

An AudioClip node specifies audio data that can be referenced by Sound nodes.

The description field specifies a textual description of the audio source. A browser is not required to display the description field but may choose to do so in addition to playing the sound.

The url field specifies the URL from which the sound is loaded. Browsers shall support at least the wavefile format in uncompressed PCM format (see [WAV]). It is recommended that browsers also support the MIDI file type 1 sound format (see 2.[MIDI]) and the MP3 compressed format (see 2.[I11172-1]). MIDI files are presumed to use the General MIDI patch set. 9.2.1 URLs contains details on the url field.

The loop, pauseTime, resumeTime, startTime, and stopTime inputOutput fields and the elapsedTime, isActive, and isPaused outputOnly fields, and their effects on the AudioClip node, are discussed in detail in 8 Time component. The "cycle" of an AudioClip is the length of time in seconds for one playing of the audio at the specified pitch.

The pitch field specifies a multiplier for the rate at which sampled sound is played. Values for the pitch field shall be greater than zero. Changing the pitch field affects both the pitch and playback speed of a sound. A set_pitch event to an active AudioClip is ignored and no pitch_changed field is generated. If pitch is set to 2.0, the sound shall be played one octave higher than normal and played twice as fast. For a sampled sound, the pitch field alters the sampling rate at which the sound is played. The proper implementation of pitch control for MIDI (or other note sequence sound clips) is to multiply the tempo of the playback by the pitch value and adjust the MIDI Coarse Tune and Fine Tune controls to achieve the proper pitch change.

A duration_changed event is sent whenever there is a new value for the "normal" duration of the clip. Typically, this will only occur when the current url in use changes and the sound data has been loaded, indicating that the clip is playing a different sound source. The duration is the length of time in seconds for one cycle of the audio for a pitch set to 1.0. Changing the pitch field will not trigger a duration_changed event. A duration value of "−1" implies that the sound data has not yet loaded or the value is unavailable for some reason. A duration_changed event shall be generated if the AudioClip node is loaded when the X3D file is read or the AudioClip node is added to the scene graph.

The isActive field may be used by other nodes to determine if the clip is currently active. If an AudioClip is active, it shall be playing the sound corresponding to the sound time (i.e., in the sound's local time system with sample 0 at time 0):

    t = (now − startTime) modulo (duration / pitch)

16.4.2 Sound

Sound : X3DSoundNode {
  SFVec3f [in,out] direction  0 0 1 (-∞,∞)
  SFFloat [in,out] intensity  1     [0,1]
  SFVec3f [in,out] location   0 0 0 (-∞,∞)
  SFFloat [in,out] maxBack    10    [0,∞)
  SFFloat [in,out] maxFront   10    [0,∞)
  SFNode  [in,out] metadata   NULL  [X3DMetadataObject]
  SFFloat [in,out] minBack    1     [0,∞)
  SFFloat [in,out] minFront   1     [0,∞)
  SFFloat [in,out] priority   0     [0,1]
  SFNode  [in,out] source     NULL  [X3DSoundSourceNode]
  SFBool  []       spatialize TRUE
}

The Sound node specifies the spatial presentation of a sound in a X3D scene. The sound is located at a point in the local coordinate system and emits sound in an elliptical pattern (defined by two ellipsoids). The ellipsoids are oriented in a direction specified by the direction field. The shape of the ellipsoids may be modified to provide more or less directional focus from the location of the sound.

The source field specifies the sound source for the Sound node. If the source field is not specified, the Sound node will not emit audio. The source field shall specify either an AudioClip node or a MovieTexture node. If a MovieTexture node is specified as the sound source, the MovieTexture shall refer to a movie format that supports sound (EXAMPLE  MPEG-1Systems, see ISO/IEC 11172-1).

The intensity field adjusts the loudness (decibels) of the sound emitted by the Sound node.  The intensity field has a value that ranges from 0.0 to 1.0 and specifies a factor which shall be used to scale the normalized sample data of the sound source during playback. A Sound node with an intensity of 1.0 shall emit audio at its maximum loudness (before attenuation), and a Sound node with an intensity of 0.0 shall emit no audio. Between these values, the loudness should increase linearly from a -20 dB change approaching an intensity of 0.0 to a 0 dB change at an intensity of 1.0.

NOTE  This is different from the traditional definition of intensity with respect to sound; see [SNDA].

The priority field provides a hint for the browser to choose which sounds to play when there are more active Sound nodes than can be played at once due to either limited system resources or system load. 16.2 Concepts describes a recommended algorithm for determining which sounds to play under such circumstances. The priority field ranges from 0.0 to 1.0, with 1.0 being the highest priority and 0.0 the lowest priority.

The location field determines the location of the sound emitter in the local coordinate system. A Sound node's output is audible only if it is part of the traversed scene. Sound nodes that are descended from LOD, Switch, or any grouping or prototype node that disables traversal (i.e., drawing) of its children are not audible unless they are traversed. If a Sound node is disabled by a Switch or LOD node, and later it becomes part of the traversal again, the sound shall resume where it would have been had it been playing continuously.

The Sound node has an inner ellipsoid that defines a volume of space in which the maximum level of the sound is audible. Within this ellipsoid, the normalized sample data is scaled by the intensity field and there is no attenuation. The inner ellipsoid is defined by extending the direction vector through the location. The minBack and minFront fields specify distances behind and in front of the location along the direction vector respectively. The inner ellipsoid has one of its foci at location (the second focus is implicit) and intersects the direction vector at minBack and minFront.

The Sound node has an outer ellipsoid that defines a volume of space that bounds the audibility of the sound. No sound can be heard outside of this outer ellipsoid. The outer ellipsoid is defined by extending the direction vector through the location. The maxBack and maxFront fields specify distances behind and in front of the location along the direction vector respectively. The outer ellipsoid has one of its foci at location (the second focus is implicit) and intersects the direction vector at maxBack and maxFront.

The minFront, maxFront, minBack, and maxBack fields are defined in local coordinates, and shall be greater than or equal to zero. The minBack field shall be less than or equal to maxBack, and minFront shall be less than or equal to maxFront. The ellipsoid parameters are specified in the local coordinate system but the ellipsoids' geometry is affected by ancestors' transformations.

Between the two ellipsoids, there shall be a linear attenuation ramp in loudness, from 0 dB at the minimum ellipsoid to -20 dB at the maximum ellipsoid:

    attenuation = -20 × (d' / d")

where d' is the distance along the location-to-viewer vector, measured from the transformed minimum ellipsoid boundary to the viewer, and d" is the distance along the location-to-viewer vector from the transformed minimum ellipsoid boundary to the transformed maximum ellipsoid boundary (see Figure 16.2).

Sound Node Geometry

Figure 16.2 — Sound Node Geometry

The spatialize field specifies if the sound is perceived as being directionally located relative to the viewer. If the spatialize field is TRUE and the viewer is located between the transformed inner and outer ellipsoids, the viewer's direction and the relative location of the Sound node should be taken into account during playback. Details outlining the minimum required spatialization functionality can be found in 16.2.2 Sound attenuation and spatialization. If the spatialize field is FALSE, directional effects are ignored, but the ellipsoid dimensions and intensity will still affect the loudness of the sound. If the sound source is multi-channel (EXAMPLE  stereo), the source shall retain its channel separation during playback.

cube 16.5 Support levels

The Sound component provides one level of support as specified in Table 16.2.

Table 16.2 — Sound component support levels

Level Prerequisites Nodes/Features Support

1

Core 1
Time 1
Rendering 1
Shape 1
   
    X3DSoundSourceNode (abstract) n/a
    X3DSoundNode (abstract) n/a
    AudioClip All fields fully supported.
    Sound All fields fully supported.

--- X3D separator bar ---