npm install @kortexa-ai/react-multimodal

react-multimodal

Effortlessly Integrate Camera, Microphone, and AI-Powered Body/Hand Tracking into Your React Applications.

react-multimodal is a comprehensive React library designed to simplify the integration of various media inputs and advanced AI-driven tracking capabilities into your web applications. It provides a set of easy-to-use React components and hooks, abstracting away the complexities of managing media streams, permissions, and real-time AI model processing (like MediaPipe for hand and body tracking).

Demo

Live Demo - Interactive hand tracking demo showcasing MediaPipe integration

Why use `react-multimodal`?

- Simplified Media Access: Get up and running with camera and microphone feeds in minutes.
- Advanced AI Features: Seamlessly integrate cutting-edge hand and body tracking without deep AI/ML expertise.
- Unified API: Manage multiple media sources (video, audio, hands, body) through a consistent and declarative API.
- React-Friendly: Built with React developers in mind, leveraging hooks and context for a modern development experience.
- Performance Conscious: Designed to be efficient, especially for real-time AI processing tasks.

Core Features

react-multimodal offers the following key components and hooks:

- 🎥 CameraProvider & useCamera: Access and manage camera video streams. Provides the raw MediaStream for direct use or rendering with helper components.
- 🎤 MicrophoneProvider & useMicrophone: Access and manage microphone audio streams. Provides the raw MediaStream.
- 🖐️ HandsProvider & useHands: Implements real-time hand tracking and gesture recognition using MediaPipe Tasks Vision. Provides detailed landmark data and built-in gesture detection for common hand gestures.
- 🤸 BodyProvider & useBody: (Coming Soon/Conceptual) Intended for real-time body pose estimation.
- 🧩 MediaProvider & useMedia: The central, unified provider. Combines access to camera, microphone, hand tracking, and body tracking. This is the recommended way to use multiple modalities.
- Easily enable or disable specific media types (video, audio, hands, body).
- Manages underlying providers and their lifecycles.
- Provides a consolidated context with all active media data and control functions (startMedia, stopMedia).

Additionally, there are couple of reusable components in the examples:

- 🖼️ CameraView: A utility component to easily render a video stream (e.g., from CameraProvider or MediaProvider) onto a canvas, often used for overlaying tracking visualizations. (/src/examples/common/src/CameraView.jsx)
- 🎤 MicrophoneView: A utility component for a simple visualization of an audio stream (e.g., from MicrophoneProvider or MediaProvider) onto a canvas, often used for overlaying tracking visualizations. (/src/examples/common/src/MicrophoneView.jsx)

Installation

``bash npm install @kortexa-ai/react-multimodal

`or`


yarn add @kortexa-ai/react-multimodal


You will also need to install peer dependencies if you plan to use features like hand tracking:

`bash npm install @mediapipe/tasks-vision

`or`


yarn add @mediapipe/tasks-vision


Getting Started

Here's how you can quickly get started with react-multimodal:

`$3`

Wrap your application or relevant component tree with MediaProvider.

`jsx // App.js or your main component import { MediaProvider } from "@kortexa-ai/react-multimodal"; import MyComponent from "./MyComponent";

function App() { return ( ); }

export default App;`

`$3`

Use the useMedia hook within a component wrapped by MediaProvider.

`jsx // MyComponent.jsx import React, { useEffect, useRef } from "react"; import { useMedia } from "@kortexa-ai/react-multimodal"; // Assuming CameraView is imported from your project or the library's examples // import CameraView from './CameraView';

function MyComponent() { const { videoStream, audioStream, handsData, // Will be null or empty if handsProps is not provided isMediaReady, isStarting, startMedia, stopMedia, currentVideoError, currentAudioError, currentHandsError, } = useMedia();

useEffect(() => { // Automatically start media when the component mounts // Or trigger with a button click: startMedia(); if (!isMediaReady && !isStarting) { startMedia(); }

return () => { // Clean up when the component unmounts stopMedia(); }; }, [startMedia, stopMedia, isMediaReady, isStarting]);

if (currentVideoError) return

Video Error: {currentVideoError.message}

;
    if (currentAudioError)
        return Audio Error: {currentAudioError.message}
;
    if (currentHandsError)
        return Hands Error: {currentHandsError.message}
;
    return (
        

            Multimodal Demo

            
            
            {isMediaReady && videoStream && (
                

                    Camera Feed

                    {/ For CameraView, you'd import and use it like: /}
                    {/  /}
                                            ref={(el) => {
                            if (el) el.srcObject = videoStream;
                        }}
                        autoPlay
                        playsInline
                        muted
                        style={{
                            width: "640px",
                            height: "480px",
                            border: "1px solid black",
                        }}
                    />
                

            )}
            {isMediaReady &&
                handsData &&
                handsData.detectedHands &&
                handsData.detectedHands.length > 0 && (
                    

                        Hands Detected: {handsData.detectedHands.length}

                        {handsData.detectedHands.map((hand, index) => (
                            
                                Hand {index + 1} ({hand.handedness.categoryName})

                                Landmarks: {hand.landmarks.length} points

                                {hand.gestures.length > 0 && (
                                    
                                        Detected Gestures:
                                        
                                            {hand.gestures.map((gesture, gIndex) => (
                                                
                                                    {gesture.categoryName} (confidence: {(gesture.score * 100).toFixed(1)}%)
                                                

                                            ))}
                                        

                                    

                                )}
                            

                        ))}
                    

                )}
            {isMediaReady && audioStream && 
Microphone is active.
}
            {!isMediaReady && !isStarting && (
                Click "Start Media" to begin.

            )}
        

    );
}

export default MyComponent;`

`$3`

The handsData from useMedia (if handsProps is provided) provides landmarks. You can use these with a CameraView component (like the one in /src/examples/common/src/CameraView.jsx) or a custom canvas solution to draw overlays.

`jsx // Conceptual: Inside a component using CameraView for drawing // import { CameraView } from '@kortexa-ai/react-multimodal/examples'; // Adjust path as needed

// ... (inside a component that has access to videoStream and handsData) // {isMediaReady && videoStream && ( // // stream={videoStream} // width="640" // height="480" // handsData={handsData} // Pass handsData to CameraView for rendering overlays // /> // )} // ...`

Refer to the CameraView.jsx in the examples directory for a practical implementation of drawing hand landmarks.

`Built-in Gesture Recognition`

The library now includes built-in gesture recognition powered by MediaPipe Tasks Vision. The following gestures are automatically detected:

- pointing_up- Index finger pointing upward -pointing_down- Index finger pointing downward -pointing_left- Index finger pointing left -pointing_right- Index finger pointing right -thumbs_up- Thumb up gesture -thumbs_down- Thumb down gesture -victory- Peace sign (V shape) -open_palm- Open hand/stop gesture -closed_fist- Closed fist -call_me- Pinky and thumb extended -rock- Rock and roll sign -love_you - I love you sign

Each detected gesture includes a confidence score and can be accessed through the gestures property of each detected hand.

`API Highlights`

`$3`

The primary way to integrate multiple media inputs.

Props:

- cameraProps?: UseCameraProps (optional): Provide an object (even an empty {}) to enable camera functionality. Omit or pass undefined to disable. Refer to UseCameraProps (from src/camera/useCamera.ts) for configurations like defaultFacingMode, requestedWidth, etc. -microphoneProps?: UseMicrophoneProps (optional): Provide an object (even an empty {}) to enable microphone functionality. Omit or pass undefined to disable. Refer to UseMicrophoneProps (from src/microphone/types.ts) for configurations like sampleRate. -handsProps?: HandsProviderProps (optional): Provide an object (even an empty {}) to enable hand tracking and gesture recognition. Omit or pass undefinedto disable. Key options include: -enableGestures?: boolean(default: true): Enable built-in gesture recognition -gestureOptions?: Fine-tune gesture detection settings -onGestureResults?: Callback for gesture-specific events -options?: MediaPipe settings (e.g., maxNumHands, minDetectionConfidence) -bodyProps?: any(optional, future): Configuration for body tracking. Provide an object to enable, omit to disable. -startBehavior?: "proceed" | "halt" (optional, default: "proceed"): Advanced setting to control initial auto-start behavior within the orchestrator. -onMediaReady?: () => void: Callback when all requested media streams are active. -onMediaError?: (errorType: 'video' | 'audio' | 'hands' | 'body' | 'general', error: Error) => void: Callback for media errors, specifying the type of error.

Context via useMedia():

- videoStream?: MediaStream: The camera video stream. -audioStream?: MediaStream: The microphone audio stream. -handsData?: HandsData: Hand tracking and gesture recognition results from MediaPipe. -HandsData: Contains { detectedHands: DetectedHand[] } where each DetectedHandincludes landmarks, world landmarks, handedness, and detected gestures. -bodyData?: any: Body tracking results (future). -isMediaReady: boolean: True if all requested media streams are active and ready. -isStarting: boolean: True if media is currently in the process of starting. -startMedia: () => Promise: Function to initialize and start all enabled media. -stopMedia: () => void: Function to stop all active media and release resources. -currentVideoError?: Error: Current error related to video. -currentAudioError?: Error: Current error related to audio. -currentHandsError?: Error: Current error related to hand tracking. -currentBodyError?: Error: Current error related to body tracking (future).

`$3`

While MediaProvider is recommended for most use cases, individual providers like HandsProvider or CameraProvider can be used if you only need a specific modality. They offer a more focused context (e.g., useHands() for HandsProvider, useCamera() for CameraProvider). Their API structure is similar, providing specific data, ready states, start/stop functions, and error states for their respective modality.

`Examples`

For more detailed and interactive examples, please check out the /examples directory within this repository. It includes demonstrations of:

- Using MediaProvider with CameraView. - Visualizing hand landmarks and connections. - Controlling media start/stop and handling states.

`Troubleshooting`

Don't forget to deduplicate @mediapipe in your vite config:

`ts resolve: { dedupe: [ "react", "react-dom", "@kortexa-ai/react-multimodal", "@mediapipe/tasks-vision", ]; }``

---

npm install @kortexa-ai/react-multimodal

react-multimodal

Effortlessly Integrate Camera, Microphone, and AI-Powered Body/Hand Tracking into Your React Applications.

Demo

Live Demo - Interactive hand tracking demo showcasing MediaPipe integration

Why use `react-multimodal`?

Core Features

react-multimodal offers the following key components and hooks:

Additionally, there are couple of reusable components in the examples:

Installation

``bash npm install @kortexa-ai/react-multimodal

`or`


yarn add @kortexa-ai/react-multimodal


You will also need to install peer dependencies if you plan to use features like hand tracking:

`bash npm install @mediapipe/tasks-vision

`or`


yarn add @mediapipe/tasks-vision


Getting Started

Here's how you can quickly get started with react-multimodal:

`$3`

Wrap your application or relevant component tree with MediaProvider.

`jsx // App.js or your main component import { MediaProvider } from "@kortexa-ai/react-multimodal"; import MyComponent from "./MyComponent";

function App() { return ( ); }

export default App;`

`$3`

Use the useMedia hook within a component wrapped by MediaProvider.

useEffect(() => { // Automatically start media when the component mounts // Or trigger with a button click: startMedia(); if (!isMediaReady && !isStarting) { startMedia(); }

return () => { // Clean up when the component unmounts stopMedia(); }; }, [startMedia, stopMedia, isMediaReady, isStarting]);

if (currentVideoError) return

Video Error: {currentVideoError.message}

;
    if (currentAudioError)
        return Audio Error: {currentAudioError.message}
;
    if (currentHandsError)
        return Hands Error: {currentHandsError.message}
;
    return (
        

            Multimodal Demo

            
            
            {isMediaReady && videoStream && (
                

                    Camera Feed

                    {/ For CameraView, you'd import and use it like: /}
                    {/  /}
                                            ref={(el) => {
                            if (el) el.srcObject = videoStream;
                        }}
                        autoPlay
                        playsInline
                        muted
                        style={{
                            width: "640px",
                            height: "480px",
                            border: "1px solid black",
                        }}
                    />
                

            )}
            {isMediaReady &&
                handsData &&
                handsData.detectedHands &&
                handsData.detectedHands.length > 0 && (
                    

                        Hands Detected: {handsData.detectedHands.length}

                        {handsData.detectedHands.map((hand, index) => (
                            
                                Hand {index + 1} ({hand.handedness.categoryName})

                                Landmarks: {hand.landmarks.length} points

                                {hand.gestures.length > 0 && (
                                    
                                        Detected Gestures:
                                        
                                            {hand.gestures.map((gesture, gIndex) => (
                                                
                                                    {gesture.categoryName} (confidence: {(gesture.score * 100).toFixed(1)}%)
                                                

                                            ))}
                                        

                                    

                                )}
                            

                        ))}
                    

                )}
            {isMediaReady && audioStream && 
Microphone is active.
}
            {!isMediaReady && !isStarting && (
                Click "Start Media" to begin.

            )}
        

    );
}

export default MyComponent;`

`$3`

`jsx // Conceptual: Inside a component using CameraView for drawing // import { CameraView } from '@kortexa-ai/react-multimodal/examples'; // Adjust path as needed

Refer to the CameraView.jsx in the examples directory for a practical implementation of drawing hand landmarks.

`Built-in Gesture Recognition`

The library now includes built-in gesture recognition powered by MediaPipe Tasks Vision. The following gestures are automatically detected:

Each detected gesture includes a confidence score and can be accessed through the gestures property of each detected hand.

`API Highlights`

`$3`

The primary way to integrate multiple media inputs.

Props:

Context via useMedia():

`$3`

`Examples`

For more detailed and interactive examples, please check out the /examples directory within this repository. It includes demonstrations of:

- Using MediaProvider with CameraView. - Visualizing hand landmarks and connections. - Controlling media start/stop and handling states.

`Troubleshooting`

Don't forget to deduplicate @mediapipe in your vite config:

`ts resolve: { dedupe: [ "react", "react-dom", "@kortexa-ai/react-multimodal", "@mediapipe/tasks-vision", ]; }``

---

@kortexa-ai/react-multimodal

react-multimodal

Demo

Why use react-multimodal?

Core Features

Installation

or

or

Getting Started

$3

$3

Multimodal Demo

Camera Feed

Hands Detected: {handsData.detectedHands.length}

Hand {index + 1} ({hand.handedness.categoryName})

$3

Built-in Gesture Recognition

API Highlights

$3

$3

Examples

Troubleshooting

@kortexa-ai/react-multimodal

react-multimodal

Demo

Why use react-multimodal?

Core Features

Installation

or

or

Getting Started

$3

$3

Multimodal Demo

Camera Feed

Hands Detected: {handsData.detectedHands.length}

Hand {index + 1} ({hand.handedness.categoryName})

$3

Built-in Gesture Recognition

API Highlights

$3

$3

Examples

Troubleshooting

Why use `react-multimodal`?

`or`

`or`

`$3`

`$3`

`$3`

`Built-in Gesture Recognition`

`API Highlights`

`$3`

`$3`

`Examples`

`Troubleshooting`

Why use `react-multimodal`?

`or`

`or`

`$3`

`$3`

`$3`

`Built-in Gesture Recognition`

`API Highlights`

`$3`

`$3`

`Examples`

`Troubleshooting`