[X3D-Ecosystem] ACM > motion capture for sign language
John Carlson
yottzumm at gmail.com
Fri Jan 2 22:17:00 PST 2026
So looking for software that produces local joint rotations, I queried
Google AI:
In 2026, several Software Development Kits (SDKs) are capable of producing
local joint rotations directly from a standard webcam or video feed. These
SDKs typically provide rotational data in the form of
*Quaternions* or *Euler angles*, which represent the rotation of a "child"
bone relative to its "parent" bone in a skeletal hierarchy.
1. ZED SDK (Stereolabs)
The ZED SDK is a specialized tool for body tracking that can produce *local
rotations between neighboring bones*.
- *Key Feature:* It outputs both the 3D location [X, Y, Z] and the
orientation [X, Y, Z, W] as a quaternion for each tracked joint.
- *Compatibility:* While often used with ZED stereo cameras, its
body-tracking module is highly optimized for providing a full skeletal
hierarchy with relative rotational data for simulation and robotics.
2. MediaPipe (Google AI Edge)
MediaPipe remains the most widely used free SDK for real-time webcam pose
estimation in 2026.
- *Rotational Output:* The *Pose Landmarker* task provides 3D "world
coordinates" for 33 landmarks. To get local rotations, developers typically
use these coordinates to calculate vectors and then derive the rotation
using the Rodrigues's rotation formula.
- *Face & Hands:* The *Face Landmarker*specifically provides *transformation
matrices* and blendshape scores, which are used to infer detailed
surfaces and rotations for virtual avatars.
3. Move.ai SDK
Move.ai provides an SDK focused on turning real-world video into
high-fidelity 3D animation data without requiring expensive sensors.
- *Key Feature:* It supports "Local Processing" (Move Pro 2.0) which can
be run on high-end NVIDIA GPUs to extract complex motion data directly from
a PC setup.
- *Output:* It generates industry-standard animation data (often FBX)
that includes the necessary joint rotations for retargeting to 3D
characters in Unreal Engine or Maya.
4. DeepMotion SDK
DeepMotion specializes in AI-powered motion capture and provides an API/SDK
for generating 3D skeletal data from video.
- *Key Feature:* Their *Animate 3D*service and SDK provide 3D skeletal
joint positions and rotations (often recommended in XYZ order).
- *Avatars:* It includes specialized support for head and eyeball
rotations using the ARKit Blendshape standard.
5. Movement SDK (Meta Horizon OS)
Designed for developers building on Meta's ecosystem, this SDK includes
advanced processors for body tracking.
- *Key Feature:* It includes a *TwistProcessor* specifically to manage
joint rotations, fixing issues like "candy wrapping" around wrists by
applying local rotational corrections to the character's joints.
Summary of Output Formats
SDK Primary Rotation Format Best For
*ZED SDK* Quaternions Robotics & Spatial Apps
*MediaPipe* 3D Coordinates (Vector-based) Web & Mobile Apps
*Move.ai* FBX Animation Data Professional Animation
*DeepMotion* XYZ Euler / FBX Game Development (Unity/Unreal)
*Meta Movement* Twist/Corrective Rotations VR/XR Avatars
==================================
I currently use MediaPipe/OpenCV. So next, I queried what Rodrigues’
formula was.
==================================
*Rodrigues’ rotation formula* is an efficient algorithm used in 3D geometry
and computer vision to rotate a vector in space by a specific angle around
a fixed axis. In the context of webcam joint tracking, it is the primary
mathematical tool used to convert 3D coordinates into rotational data
(axis-angle representation).
1. The Vector Formula
Given a vector
v
𝐯
, a unit vector axis
k
𝐤
, and a rotation angle
θ
𝜃
, the formula to find the rotated vector
vrot
𝐯𝑟𝑜𝑡
is:
vrot=vcosθ+(k×v)sinθ+k(k⋅v)(1−cosθ)
𝐯𝑟𝑜𝑡=𝐯cos𝜃+(𝐤×𝐯)sin𝜃+𝐤(𝐤⋅𝐯)(1−cos𝜃)
-
*vcosθ 𝐯cos𝜃*: Scales the original vector.
-
*(k×v)sinθ (𝐤×𝐯)sin𝜃*: Handles the portion of the rotation perpendicular
to the axis.
-
*k(k⋅v)(1−cosθ) 𝐤(𝐤⋅𝐯)(1−cos𝜃)*: Accounts for the component of the
vector that is parallel to the rotation axis, which remains unchanged by
the rotation.
2. The Matrix Form
For computational efficiency in software (like OpenCV or Unity), the
formula is often converted into a
*3x3 rotation matrix (R𝑅)*:
R=I+(sinθ)K+(1−cosθ)K2
𝑅=𝐼+(sin𝜃)𝐾+(1−cos𝜃)𝐾2
Where:
-
*I 𝐼* is the identity matrix.
-
*K 𝐾* is the skew-symmetric "cross-product matrix" of the rotation axis
k
𝐤
.
3. Application in Webcam Tracking
When a webcam SDK (like MediaPipe) provides the 3D position of an elbow and
a shoulder, Rodrigues' formula is used in *reverse*:
1. *Calculate the Vector*: Find the vector between the two joints.
2. *Define the Goal*: Determine how much that vector has moved compared
to a "neutral" pose (the axis of rotation).
3. *Extract Rotation*: Use the inverse of Rodrigues' formula to extract
the specific angle (
θ
𝜃
) and axis (
k
𝐤
) needed to reach that new position.
4. Key Advantages
- *Avoids Gimbal Lock*: Unlike Euler angles (Pitch/Yaw/Roll), this
formula does not suffer from the mathematical "locking" that occurs when
two axes align.
- *Computational Efficiency*: It is faster than using quaternions for
simple rotations and requires fewer calculations than standard
trigonometric rotation matrices.
- *Standardized*: It is the foundation of the cv2.Rodrigues() function
in the widely used OpenCV library.
This guide explains Rodrigues' rotation formula, its advantages over Euler
angles, and its use in 3D computer vision applications:
==================================
So it looks like Rodrigues’ formula is built into OpenCV. I’m halfway
there. Now I just ask how to integrate Rodrigues function into
MediaPipe/OpenCV!
=================================
To apply the
*Rodrigues function* across an entire skeleton in Python, you must
define a *bone
hierarchy* and iterate through it. In 2026, the standard approach is to
calculate the rotation of each "child" bone relative to its "parent" to
produce *local rotations*, which are required for standard 3D animation
rigs.
1. Define the Bone Hierarchy
MediaPipe landmarks are flat indices (0–32). To get skeletal rotations, you
must map these into pairs that represent bones.
python
# Standard bone mappings (Parent Index, Child Index)
BONE_HIERARCHY = {
"upper_arm_r": (12, 14), # Right Shoulder to Elbow
"forearm_r": (14, 16), # Right Elbow to Wrist
"upper_arm_l": (11, 13), # Left Shoulder to Elbow
"forearm_l": (13, 15), # Left Elbow to Wrist
"thigh_r": (24, 26), # Right Hip to Knee
"calf_r": (26, 28), # Right Knee to Ankle
}
Use code with caution.
2. General Function for Any Bone
The following function takes two landmark points, calculates the rotation
vector (
rvec
𝑟𝑣𝑒𝑐
) needed to align a reference vector (e.g., pointing down the Y-axis) with
the current bone's direction, and uses cv2.Rodrigues() to return a rotation
matrix.
python
import numpy as np
import cv2
def get_bone_rotation(p1, p2, reference_vector=np.array([0, -1, 0])):
"""Calculates the rotation matrix for a bone defined by points p1 and p2."""
# 1. Create the bone vector and normalize it
bone_vec = p2 - p1
bone_vec /= np.linalg.norm(bone_vec)
# 2. Find the rotation axis (cross product) and angle (dot product)
rotation_axis = np.cross(reference_vector, bone_vec)
rotation_axis /= (np.linalg.norm(rotation_axis) + 1e-6) # Avoid
division by zero
dot_product = np.clip(np.dot(reference_vector, bone_vec), -1.0, 1.0)
angle = np.arccos(dot_product)
# 3. Create the rotation vector (axis * angle)
rvec = rotation_axis * angle
# 4. Convert to 3x3 Matrix via Rodrigues
rotation_matrix, _ = cv2.Rodrigues(rvec)
return rotation_matrix
Use code with caution.
3. Iterating Through All Bones
To process the entire skeleton from a MediaPipe result, iterate through
your hierarchy and store the matrices. In 2026, most developers use
*world_landmarks* for these calculations because they are provided in
meters and are independent of the image's pixel dimensions.
python
def process_skeleton(pose_world_landmarks):
landmarks = np.array([[lm.x, lm.y, lm.z] for lm in
pose_world_landmarks.landmark])
bone_rotations = {}
for bone_name, (parent_idx, child_idx) in BONE_HIERARCHY.items():
p_start = landmarks[parent_idx]
p_end = landmarks[child_idx]
# Calculate the rotation matrix for this specific bone
bone_rotations[bone_name] = get_bone_rotation(p_start, p_end)
return bone_rotations
Use code with caution.
4. Handling Local vs. Global Rotation
The code above calculates *global* rotations (relative to the camera). If
you need *local* rotations (e.g., the elbow's rotation relative to the
shoulder's orientation), you must use matrix inversion:
- *Formula:*
Rlocal=Rparent-1×Rchild
𝑅𝑙𝑜𝑐𝑎𝑙=𝑅−1𝑝𝑎𝑟𝑒𝑛𝑡×𝑅𝑐ℎ𝑖𝑙𝑑
- Since rotation matrices are orthogonal, the inverse is simply the
*transpose*: R_local = parent_matrix.T @ child_matrix.
Summary of Workflow
1. *Extract:* Use results.pose_world_landmarks to get 3D coordinates.
2. *Define:* Map parent-child indices to represent bones (e.g., 11
→
→
13 for left upper arm).
3. *Calculate:* Use the Rodrigues formula via cv2.Rodrigues() to turn
the vector between those joints into a rotation matrix.
4. *Filter:* Apply a smoothing filter (like a moving average) to the
matrices to remove webcam jitter before applying them to a 3D model.
==================================
So the bones are defined like this:
def sendLines(self, connections, prefix):
for connection in connections:
# make sure not to draw these lines
if prefix == "p" and connection[0] == poselm.LEFT_SHOULDER and
connection[1] == poselm.LEFT_HIP:
pass
elif prefix == "p" and connection[0] == poselm.LEFT_SHOULDER and
connection[1] == poselm.RIGHT_SHOULDER:
pass
elif prefix == "p" and connection[0] == poselm.RIGHT_SHOULDER and
connection[1] == poselm.RIGHT_HIP:
pass
else:
self.sendMPLine(f"{prefix}{connection[0]}", f"{prefix}{connection[1]}")
# draw these lines instead
if prefix == "p":
self.sendMPLine(f"p34", f"p11") # patch the vc7 to the left shoulder
self.sendMPLine(f"p34", f"p12") # patch the vc7 to the right shoulder
self.sendMPLine(f"p33", f"p34") # patch the sacroiliac to vc7
self.sendMPLine(f"p13", f"l0") # patch the left elbow to the left wrist
self.sendMPLine(f"p14", f"r0") # patch the right elbow to the right wrist
def sendAll(self, image, landmarks, suffix, connections, lmlist):
# send lines to refresh the screen
self.sendLines(connections, suffix) # left hand
# construct each time, because they disappear
# self.constructPoints(landmarks, "_"+suffix, lmlist)
# send coordinates
self.sendPoints(image, landmarks, "_"+suffix, connections, lmlist)
# Comment out these lines as desired. Please don't delete them
self.sendAll(image, results.left_hand_landmarks, "l",
mp_holistic.HAND_CONNECTIONS, hand)
self.sendAll(image, results.right_hand_landmarks, "r",
mp_holistic.HAND_CONNECTIONS, hand)
self.sendAll(image, results.pose_landmarks, "p",
mp_holistic.POSE_CONNECTIONS, pose)
# These are a big performance hit
#self.sendAll(image, results.face_landmarks, "t",
mp_holistic.FACEMESH_TESSELATION, face)
#self.sendAll(image, results.face_landmarks, "c",
mp_holistic.FACEMESH_CONTOURS, face)
Code is here:
https://github.com/coderextreme/deafblind/blob/main/osc.py
So all that’s really needed is to take the connections, above, and apply
Rodrigues’ functions as defined in previous sections.
I can probably get AI to produce something in a matter of hours. Then hook
in OSC/VMC and hook up OSC/MVC up to socket.io/JSONvirse to get a working
avatar in JSONvirse.
A leftover step is to get a moving average of the joint/bone:
==================================
In 2026, the
*MediaPipe Holistic* model is the preferred solution for full-body tracking
because it synchronizes 33 pose landmarks with 21 high-fidelity landmarks
for each hand.
To apply the Rodrigues rotation and smoothing to hands, you must define the
finger hierarchies (Carpal → MCP → PIP → DIP) and manage independent
filters for each hand's landmarks.
1. Updated Hierarchy for Hands and Pose
This dictionary maps the parent and child indices for both the core body
and the internal hand structure.
python
# Holistic Hierarchy: (Parent Index, Child Index)BONE_HIERARCHY = {
# Core Body (Pose Landmarks 0-32)
"upper_arm_r": (12, 14), "forearm_r": (14, 16),
"upper_arm_l": (11, 13), "forearm_l": (13, 15),
# Right Hand (Holistic Hand Landmarks 0-20)
"r_thumb_1": (0, 1), "r_thumb_2": (1, 2), "r_thumb_3": (2, 3),
"r_thumb_4": (3, 4),
"r_index_1": (0, 5), "r_index_2": (5, 6), "r_index_3": (6, 7),
"r_index_4": (7, 8),
"r_middle_1": (0, 9), "r_middle_2": (9, 10), "r_middle_3": (10,
11), "r_middle_4": (11, 12),
"r_ring_1": (0, 13), "r_ring_2": (13, 14), "r_ring_3": (14, 15),
"r_ring_4": (15, 16),
"r_pinky_1": (0, 17), "r_pinky_2": (17, 18), "r_pinky_3": (18,
19), "r_pinky_4": (19, 20),
}# Duplicate for left hand with a prefix "l_" if processing both
Use code with caution.
2. Unified Processing Function
This function takes the Holistic results and applies the smoothing filter
to every defined joint.
python
import timeimport numpy as npimport cv2
# Store filters in a persistent dictionary to maintain state across
framesfilters = {bone: OneEuroFilter(min_cutoff=1.0, beta=0.007) for
bone in BONE_HIERARCHY.keys()}
def process_holistic_rotations(results):
"""Processes pose, left_hand, and right_hand landmarks into
smoothed rotations."""
rotations = {}
current_time = time.time()
# 1. Prepare coordinate sets (handling 'None' if hand is not in frame)
pose_lms = results.pose_world_landmarks.landmark if
results.pose_world_landmarks else None
r_hand_lms = results.right_hand_landmarks.landmark if
results.right_hand_landmarks else None
l_hand_lms = results.left_hand_landmarks.landmark if
results.left_hand_landmarks else None
for bone, (p_idx, c_idx) in BONE_HIERARCHY.items():
# Determine which landmark set to use based on bone prefix
if bone.startswith("r_"):
lms = r_hand_lms
elif bone.startswith("l_"):
lms = l_hand_lms
else:
lms = pose_lms
if lms:
p1 = np.array([lms[p_idx].x, lms[p_idx].y, lms[p_idx].z])
p2 = np.array([lms[c_idx].x, lms[c_idx].y, lms[c_idx].z])
# Use the previous 'get_smoothed_bone_rotation' logic:
# - Calculate raw rvec from (p2 - p1)
# - Apply filters[bone](current_time, rvec)
# - cv2.Rodrigues(smoothed_rvec)
rotations[bone] = calculate_rodrigues_smoothed(p1, p2,
bone, current_time)
return rotations
Use code with caution.
3. Key Implementation Details for 2026
- *Handedness Mirroring:* When calculating rotations for the left hand,
the Z-axis is often inverted by default in MediaPipe's internal coordinate
system. You may need to flip the X-component of your reference vector to
ensure the rotation direction (clockwise/counter-clockwise) remains
consistent with the right hand.
- *Occlusion Handling:* Holistic hand models use a re-crop model to
estimate landmarks even when the hand is partially hidden. The visibility
score provided by the SDK should be used to "pause" the One-Euro filter if
confidence drops below 0.5 to prevent the filter from "drifting" toward
inaccurate positions.
- *Reference Vectors:* For fingers, the reference vector is usually
pointing "up" (along the finger's length in a flat palm pose), whereas for
arms, it is usually "out" (T-pose). Modify the reference_vector in your
calculation based on whether the bone_name indicates a finger or a limb.
=================================
“Complete” code:
import numpy as np
import cv2
import mediapipe as mp
import time
# --- 1. Define the Bone Hierarchy for Holistic Landmarks ---
# This maps parent and child indices for pose and right hand (RH)
# Left hand (LH) indices are treated separately during processing.
BONE_HIERARCHY = {
# Core Body (Pose Landmarks 0-32)
"upper_arm_r": (12, 14),
"forearm_r": (14, 16),
"upper_arm_l": (11, 13),
"forearm_l": (13, 15),
"thigh_r": (24, 26),
"calf_r": (26, 28),
# Right Hand (Holistic Hand Landmarks 0-20)
# 0 is the wrist/palm base.
"r_thumb_1": (0, 1), "r_thumb_2": (1, 2), "r_thumb_3": (2, 3), "r_thumb_4":
(3, 4),
"r_index_1": (0, 5), "r_index_2": (5, 6), "r_index_3": (6, 7), "r_index_4":
(7, 8),
"r_middle_1": (0, 9), "r_middle_2": (9, 10), "r_middle_3": (10, 11),
"r_middle_4": (11, 12),
"r_ring_1": (0, 13), "r_ring_2": (13, 14), "r_ring_3": (14, 15),
"r_ring_4": (15, 16),
"r_pinky_1": (0, 17), "r_pinky_2": (17, 18), "r_pinky_3": (18, 19),
"r_pinky_4": (19, 20),
}
# Add left hand definitions by duplicating and changing 'r_' to 'l_' in
keys, indices are the same (0-20 for LH lms)
left_hand_hierarchy = {k.replace('r_', 'l_'): v for k, v in
BONE_HIERARCHY.items() if k.startswith('r_')}
BONE_HIERARCHY.update(left_hand_hierarchy)
# --- 2. One-Euro Filter Class ---
class OneEuroFilter:
def __init__(self, min_cutoff=1.0, beta=0.007, d_cutoff=1.0):
self.min_cutoff = float(min_cutoff)
self.beta = float(beta)
self.d_cutoff = float(d_cutoff)
self.x_prev = None
self.dx_prev = None
self.t_prev = None
def __call__(self, t, x):
if self.x_prev is None:
self.x_prev, self.t_prev = x, t
self.dx_prev = np.zeros_like(x)
return x
te = t - self.t_prev
if te == 0: return self.x_prev
ad = 1.0 / (1.0 + 1.0 / (2 * np.pi * self.d_cutoff * te))
dx = (x - self.x_prev) / te
dx_hat = ad * dx + (1.0 - ad) * self.dx_prev
cutoff = self.min_cutoff + self.beta * np.linalg.norm(dx_hat)
a = 1.0 / (1.0 + 1.0 / (2 * np.pi * cutoff * te))
x_hat = a * x + (1.0 - a) * self.x_prev
self.x_prev, self.t_prev, self.dx_prev = x_hat, t, dx_hat
return x_hat
# Initialize filters for all bones in our hierarchy
bone_filters = {bone: OneEuroFilter() for bone in BONE_HIERARCHY.keys()}
# --- 3. Function to Calculate Smoothed Rodrigues Rotation ---
def calculate_rodrigues_smoothed(p1, p2, bone_name, current_time,
ref_vec_override=None):
"""
Calculates the rotation matrix for a bone (p1 to p2) using Rodrigues,
applying the One-Euro filter to the rotation vector first.
"""
# Key Implementation Detail: Reference vector needs to change based on
context.
# We use (0, -1, 0) as a default reference for vertical bones like
arms/legs pointing down in a T-pose.
# For hands/fingers, this might need tuning.
reference_vector = ref_vec_override if ref_vec_override is not None else
np.array([0, -1, 0])
# 1. Calculate raw bone vector and normalize it
bone_vec = p2 - p1
bone_vec_norm = np.linalg.norm(bone_vec)
if bone_vec_norm 0.5 and lms[c_idx].visibility > 0.5:
p1 = np.array([lms[p_idx].x, lms[p_idx].y, lms[p_idx].z])
p2 = np.array([lms[c_idx].x, lms[c_idx].y, lms[c_idx].z])
# Calculate and store the smoothed rotation matrix
all_rotations[bone_name] = calculate_rodrigues_smoothed(p1, p2, bone_name,
current_time)
elif bone_name in all_rotations:
# If landmarks lost, keep previous rotation matrix in dict for continuity
pass
# --- Visualization (Optional) ---
# You can now use `all_rotations` (a dictionary of 3x3 matrices)
# to drive a 3D model in a game engine or animation software.
# We will just draw the standard MP landmarks here to prove it's running.
mp.solutions.drawing_utils.draw_landmarks(image, results.pose_landmarks,
mp_holistic.POSE_CONNECTIONS)
mp.solutions.drawing_utils.draw_landmarks(image,
results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
mp.solutions.drawing_utils.draw_landmarks(image,
results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)
cv2.imshow('Holistic Model Rotations (Rodrigues)', image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
main()
On Thu, Jan 1, 2026 at 6:30 PM GPU Group <gpugroup at gmail.com> wrote:
> I see some papers on your topic
>
> https://dl.acm.org/action/doSearch?fillQuickSearch=false&target=advanced&expand=dl&field1=AllField&text1=motion+capture
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://web3d.org/pipermail/x3d-ecosystem_web3d.org/attachments/20260103/7d9b5255/attachment-0001.html>
More information about the X3D-Ecosystem
mailing list