A set of react components and hooks to help with multimodal input
npm install @kortexa-ai/react-multimodalEffortlessly Integrate Camera, Microphone, and AI-Powered Body/Hand Tracking into Your React Applications.
react-multimodal is a comprehensive React library designed to simplify the integration of various media inputs and advanced AI-driven tracking capabilities into your web applications. It provides a set of easy-to-use React components and hooks, abstracting away the complexities of managing media streams, permissions, and real-time AI model processing (like MediaPipe for hand and body tracking).
Live Demo - Interactive hand tracking demo showcasing MediaPipe integration
react-multimodal?- Simplified Media Access: Get up and running with camera and microphone feeds in minutes.
- Advanced AI Features: Seamlessly integrate cutting-edge hand and body tracking without deep AI/ML expertise.
- Unified API: Manage multiple media sources (video, audio, hands, body) through a consistent and declarative API.
- React-Friendly: Built with React developers in mind, leveraging hooks and context for a modern development experience.
- Performance Conscious: Designed to be efficient, especially for real-time AI processing tasks.
react-multimodal offers the following key components and hooks:
- š„ CameraProvider & useCamera: Access and manage camera video streams. Provides the raw MediaStream for direct use or rendering with helper components.
- š¤ MicrophoneProvider & useMicrophone: Access and manage microphone audio streams. Provides the raw MediaStream.
- šļø HandsProvider & useHands: Implements real-time hand tracking and gesture recognition using MediaPipe Tasks Vision. Provides detailed landmark data and built-in gesture detection for common hand gestures.
- 𤸠BodyProvider & useBody: (Coming Soon/Conceptual) Intended for real-time body pose estimation.
- š§© MediaProvider & useMedia: The central, unified provider. Combines access to camera, microphone, hand tracking, and body tracking. This is the recommended way to use multiple modalities.
- Easily enable or disable specific media types (video, audio, hands, body).
- Manages underlying providers and their lifecycles.
- Provides a consolidated context with all active media data and control functions (startMedia, stopMedia).
Additionally, there are couple of reusable components in the examples:
- š¼ļø CameraView: A utility component to easily render a video stream (e.g., from CameraProvider or MediaProvider) onto a canvas, often used for overlaying tracking visualizations. (/src/examples/common/src/CameraView.jsx)
- š¤ MicrophoneView: A utility component for a simple visualization of an audio stream (e.g., from MicrophoneProvider or MediaProvider) onto a canvas, often used for overlaying tracking visualizations. (/src/examples/common/src/MicrophoneView.jsx)
``bash`
npm install @kortexa-ai/react-multimodalor
yarn add @kortexa-ai/react-multimodal
You will also need to install peer dependencies if you plan to use features like hand tracking:
`bash`
npm install @mediapipe/tasks-visionor
yarn add @mediapipe/tasks-vision
Here's how you can quickly get started with react-multimodal:
Wrap your application or relevant component tree with MediaProvider.
`jsx
// App.js or your main component
import { MediaProvider } from "@kortexa-ai/react-multimodal";
import MyComponent from "./MyComponent";
function App() {
return (
);
}
export default App;
`
Use the useMedia hook within a component wrapped by MediaProvider.
`jsx
// MyComponent.jsx
import React, { useEffect, useRef } from "react";
import { useMedia } from "@kortexa-ai/react-multimodal";
// Assuming CameraView is imported from your project or the library's examples
// import CameraView from './CameraView';
function MyComponent() {
const {
videoStream,
audioStream,
handsData, // Will be null or empty if handsProps is not provided
isMediaReady,
isStarting,
startMedia,
stopMedia,
currentVideoError,
currentAudioError,
currentHandsError,
} = useMedia();
useEffect(() => {
// Automatically start media when the component mounts
// Or trigger with a button click: startMedia();
if (!isMediaReady && !isStarting) {
startMedia();
}
return () => {
// Clean up when the component unmounts
stopMedia();
};
}, [startMedia, stopMedia, isMediaReady, isStarting]);
if (currentVideoError)
return
Video Error: {currentVideoError.message}
;Audio Error: {currentAudioError.message}
;Hands Error: {currentHandsError.message}
; return (
{isMediaReady && videoStream && (
{isMediaReady &&
handsData &&
handsData.detectedHands &&
handsData.detectedHands.length > 0 && (
Landmarks: {hand.landmarks.length} points
{isMediaReady && audioStream &&
Microphone is active.
}Click "Start Media" to begin.
export default MyComponent;
`
The handsData from useMedia (if handsProps is provided) provides landmarks. You can use these with a CameraView component (like the one in /src/examples/common/src/CameraView.jsx) or a custom canvas solution to draw overlays.
`jsx
// Conceptual: Inside a component using CameraView for drawing
// import { CameraView } from '@kortexa-ai/react-multimodal/examples'; // Adjust path as needed
// ... (inside a component that has access to videoStream and handsData)
// {isMediaReady && videoStream && (
//
// width="640"
// height="480"
// handsData={handsData} // Pass handsData to CameraView for rendering overlays
// />
// )}
// ...
`
Refer to the CameraView.jsx in the examples directory for a practical implementation of drawing hand landmarks.
The library now includes built-in gesture recognition powered by MediaPipe Tasks Vision. The following gestures are automatically detected:
- pointing_up - Index finger pointing upwardpointing_down
- - Index finger pointing downward pointing_left
- - Index finger pointing leftpointing_right
- - Index finger pointing rightthumbs_up
- - Thumb up gesturethumbs_down
- - Thumb down gesturevictory
- - Peace sign (V shape)open_palm
- - Open hand/stop gestureclosed_fist
- - Closed fistcall_me
- - Pinky and thumb extendedrock
- - Rock and roll signlove_you
- - I love you sign
Each detected gesture includes a confidence score and can be accessed through the gestures property of each detected hand.
The primary way to integrate multiple media inputs.
Props:
- cameraProps?: UseCameraProps (optional): Provide an object (even an empty {}) to enable camera functionality. Omit or pass undefined to disable. Refer to UseCameraProps (from src/camera/useCamera.ts) for configurations like defaultFacingMode, requestedWidth, etc.microphoneProps?: UseMicrophoneProps
- (optional): Provide an object (even an empty {}) to enable microphone functionality. Omit or pass undefined to disable. Refer to UseMicrophoneProps (from src/microphone/types.ts) for configurations like sampleRate.handsProps?: HandsProviderProps
- (optional): Provide an object (even an empty {}) to enable hand tracking and gesture recognition. Omit or pass undefined to disable. Key options include:enableGestures?: boolean
- (default: true): Enable built-in gesture recognitiongestureOptions?
- : Fine-tune gesture detection settingsonGestureResults?
- : Callback for gesture-specific eventsoptions?
- : MediaPipe settings (e.g., maxNumHands, minDetectionConfidence)bodyProps?: any
- (optional, future): Configuration for body tracking. Provide an object to enable, omit to disable.startBehavior?: "proceed" | "halt"
- (optional, default: "proceed"): Advanced setting to control initial auto-start behavior within the orchestrator.onMediaReady?: () => void
- : Callback when all requested media streams are active.onMediaError?: (errorType: 'video' | 'audio' | 'hands' | 'body' | 'general', error: Error) => void
- : Callback for media errors, specifying the type of error.
Context via useMedia():
- videoStream?: MediaStream: The camera video stream.audioStream?: MediaStream
- : The microphone audio stream.handsData?: HandsData
- : Hand tracking and gesture recognition results from MediaPipe.HandsData
- : Contains { detectedHands: DetectedHand[] } where each DetectedHand includes landmarks, world landmarks, handedness, and detected gestures.bodyData?: any
- : Body tracking results (future).isMediaReady: boolean
- : True if all requested media streams are active and ready.isStarting: boolean
- : True if media is currently in the process of starting.startMedia: () => Promise
- : Function to initialize and start all enabled media.stopMedia: () => void
- : Function to stop all active media and release resources.currentVideoError?: Error
- : Current error related to video.currentAudioError?: Error
- : Current error related to audio.currentHandsError?: Error
- : Current error related to hand tracking.currentBodyError?: Error
- : Current error related to body tracking (future).
While MediaProvider is recommended for most use cases, individual providers like HandsProvider or CameraProvider can be used if you only need a specific modality. They offer a more focused context (e.g., useHands() for HandsProvider, useCamera() for CameraProvider). Their API structure is similar, providing specific data, ready states, start/stop functions, and error states for their respective modality.
For more detailed and interactive examples, please check out the /examples directory within this repository. It includes demonstrations of:
- Using MediaProvider with CameraView.
- Visualizing hand landmarks and connections.
- Controlling media start/stop and handling states.
Don't forget to deduplicate @mediapipe in your vite config:
`ts``
resolve: {
dedupe: [
"react",
"react-dom",
"@kortexa-ai/react-multimodal",
"@mediapipe/tasks-vision",
];
}
---
Ā© 2025 kortexa.ai