[x3d-public] Audio and sound meeting 26 August 2020: audio graph relationships

Wed Aug 26 10:47:53 PDT 2020

Attendees: Efi, Thanos, Dick, Don.  Once again, group minutes follow.

1. Efi put together an interesting example (attached).

Proposal add:
MFNode [in, out] audioGraph [] [X3DChildNode]

In input nodes:
•	AudioClip
•	AudioBufferSource
•	OscilattorSource
•	StreamAudioSource
•	MicrophoneSource

Curiously these are all sound sources in their own right, and modification of them might not make much sense.

However we do want composability of audio graphs together as little components in their own right, so perhaps an "AudioGraphSource" node is appropriate.  Or simply SpatialSound can contain other SpatialSound nodes?  Interesting.

Efi's representation of Example 1 in X3D might thus look like

{SpatialSound or AudioGraphSource?}
audioGraph=[<SpatialSound gain= '0.6' panningModel='HRTF' refDistance='1' rolloffFactor='1'>,
		   <Analyser gain= '0.4' fftSize ='2048'/>,
		   <AudioDestrination channelInterpretation= 'speakers'/>]

----

2. Gain.  All gain values represent the amplification of input prior to sending to output.

We do need a simple way to represent a gain by itself with no other effect.

a. One way is a BiQuadFilter with a linear characteristic and no other filtering effects.
b. Alternatively since Analyzer has no effect on sound characteristics, using it with a "gain" itself might make sense and provides a good point to monitor as well.
c. A failback approach is to define a Gain node in X3D that has no other effect.

---

3.  Table of correspondences, X3D nodes and Web Audio API classes (which themselves are often named "nodes", a potential confusion).

a. Web Audio API is a set of JavaScript classes with well-defined individual class semantics, and also inter-object connection relationships.

b. X3D nodes (strictly defined in X3D Architecture) can be equivalently and consistently represented for any file encoding or language binding, independent of JavaScript XML ClassicVRML etc.

c. If we define Web Audio API graph relationships to X3D nodes, we can map those graph relationships shown in any X3D representation.  Indeed that is part of implement/evaluate criteria.

d. We are not preventing implementations using other media APIs... but we are not necessarily using their constructs either, because Web Audio API has already performed a harmonization of functionality that is now implemented multiple times and has the maturity of W3C Candidate Recommendation.

e. We are attempting to define X3D functionality and semantics that directly map to the functionality and semantics of Web Audio API.  Of note is that API uses the term "audio graph" as the name of such constructs.  We think it needs a stricter definition to match the Recommendation.  (This happens to be one of our planned feedback points to W3C Audio working group.)

Thus a table of correspondences, which shows where X3D semantics match Web Audio API semantics, can be informative and useful right now.  We do not want to duplicate definitions of all Web Audio API concepts since that can be confusing.

Whether such an informative table goes into the X3D Architecture specification itself is questionable and subject for future consideration.

---

4. Graph representation of node-field relationships, continued.

Dick described how we might consider using Sound and SpatialSound as "master controller" nodes.  This matches current X3D design where a Sound node defines location where virtual sound is produced.

A user's viewpoint is typically the location where that virtual sound is perceived.  We have added ListenerPoint node in X3D as a way to define an additional sampling point for perceived sound at another virtual location.

Monophonic versus stereo seems a bit complex, but if we get everything lined up for single-channel satisfactorily then slight modifications for adding stereo might be possible.  Can we proceed with mono for now, then examine stereo later???

Don shared some graph relationships in a diagram earlier (again attached).  These can work in XML or any of the encodings.

Here is a crude example using ClassicVRML syntax:
=================================================
DEF myMix ChannelMerger
	# the following are provided as mix inputs to the parent ChannelMerger node
	inputs [USE AudioSource1, USE MicrophoneSource2, USE ChannelSplitter3]

DEF myMasterVirtual SpatialSound
	# here is where the mix output goes
	inputs [USE myMix]

=================================================
alternative example, avoiding USE:

DEF myMasterVirtual SpatialSound {
	inputs [DEF AudioSource1      AudioSource{},
		DEF MicrophoneSource2 MicrophoneSource{},
		DEF ChannelSplitter3  ChannelSplitter {
			inputs [
			  # additional inputs for multiple channels here etc.
			  # can contain nested relationships for more effects, more audio subgraphs, etc.
			]
		}
	]
}
=================================================
alternative (equivalent) XML:

<SpatialSound DEF="myMasterVirtual">
	<AudioSource      DEF="AudioSource1"      containerField="inputs"/>
	<MicrophoneSource DEF="MicrophoneSource2" containerField="inputs"/>
	<ChannelSplitter  DEF="ChannelSplitter3"  containerField="inputs">
		<!-- additional inputs for multiple channels here etc. -->
		<!-- can contain nested relationships for more effects, more audio subgraphs, etc. -->
	</ChannelSplitter>
</SpatialSound >

=================================================

Interestingly with this approach: we might not even need an 'outputs' field since parent-child relationships explicitly define that the expected outputs of a child node belong to input field of the parent nodes.

Motivation for this design is similar to other nodes in X3D.  For example, a set of Shape nodes are children of a Transform... they do not need to "send" anything or define their parents duplicatively, the parent-child field relationship ("a Transform node has children nodes") is sufficient to describe the child node's role.

Of note in this kind of approach is that ordering of nodes in a scene can be quite relaxed, but parent-child field relationships explicitly define the graph, thus explicitly define how each node's outputs are connected to their parent node's inputs.

Note that we are focused on monophonic channels at present.  Hopefully this MFNode parent-child relationship will help when we look to add stereophonic capabilities.  Definitions of channels need to be examined closely.  We shall see - intentionally defer for now.

Treating SpatialSound as comparable to Sound might handle most backwards compatibility issues for how X3D/VRML authors (and implementers) already employ Sound in scenes.  Further scrutiny will be needed.

As ever, building examples is the "best test" of whether a design approach is workable.

We were all encouraged by the steady progress achieved today.

---

5. TODO: Web3D 2020 Conference tutorial proposal completed prior to 12 SEP 2020.

We meet again next week, Wednesday, same time.

Have fun with X3D audio graphs, whatever they are!!   8)

all the best, Don
-- 
Don Brutzman  Naval Postgraduate School, Code USW/Br       brutzman at nps.edu
Watkins 270,  MOVES Institute, Monterey CA 93943-5000 USA   +1.831.656.2149
X3D graphics, virtual worlds, navy robotics http://faculty.nps.edu/brutzman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ExamplesSound_26_08_2020.pdf
Type: application/pdf
Size: 583156 bytes
Desc: not available
URL: <http://web3d.org/pipermail/x3d-public_web3d.org/attachments/20200826/f722c016/attachment-0001.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: X3dAudioGraphExample1.png
Type: image/png
Size: 166345 bytes
Desc: not available
URL: <http://web3d.org/pipermail/x3d-public_web3d.org/attachments/20200826/f722c016/attachment-0001.png>