๐๏ธ MBZ Voice SDK: Easily add voice recognition, Gemini-based AI replies, and TTS to any web app.
npm install mbz-voice-sdkbash
npx mbz-voice-sdk init
$3
Here's an enhanced README.md file with more complete details for the MBZ Voice SDK:
`markdown
...
`
$3
`shellscript
yarn add mbz-voice-sdk
`
$3
`shellscript
cd mbz-voice-sdk/sdk
npm install
`
$3
`html
`
โ๏ธ Backend Setup Guide
This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the /backend folder.
$3
`shellscript
cd ../backend
`
$3
`shellscript
pip install -r requirements.txt
`
$3
Create a .env file in the backend folder and paste your Gemini API key:
`plaintext
GEMINI_API_KEY=your_google_gemini_api_key_here
`
๐ Get your key from: https://makersuite.google.com/app/apikey
$3
`shellscript
uvicorn main:app --reload
`
Now your backend is live at:
`plaintext
http://localhost:8000/ask
`
๐ง SDK Usage Example
$3
`javascript
import { MBZVoiceAgent } from "mbz-voice-sdk";
const agent = new MBZVoiceAgent({
apiUrl: "http://localhost:8000/ask",
lang: "en-US",
speak: true
});
agent.onTranscript((text) => {
console.log("User said:", text);
});
agent.onResponse((reply) => {
console.log("AI replied:", reply);
});
document.getElementById("start-btn").onclick = () => agent.listen();
`
$3
`javascriptreact
import React, { useEffect, useState } from 'react';
import { MBZVoiceAgent } from 'mbz-voice-sdk';
function VoiceAssistant() {
const [transcript, setTranscript] = useState('');
const [response, setResponse] = useState('');
const [isListening, setIsListening] = useState(false);
const [agent, setAgent] = useState(null);
useEffect(() => {
// Initialize the agent
const voiceAgent = new MBZVoiceAgent({
apiUrl: "http://localhost:8000/ask",
lang: "en-US",
speak: true
});
// Set up event handlers
voiceAgent.onTranscript((text) => {
setTranscript(text);
});
voiceAgent.onResponse((reply) => {
setResponse(reply);
});
voiceAgent.onListeningChange((listening) => {
setIsListening(listening);
});
setAgent(voiceAgent);
// Cleanup on unmount
return () => {
voiceAgent.cleanup();
};
}, []);
const handleListen = () => {
if (agent) {
agent.listen();
}
};
return (
onClick={handleListen}
className={isListening ? 'listening' : ''}
>
{isListening ? '๐ด Listening...' : '๐๏ธ Start Talking'}
{transcript && (
You said:
{transcript}
)}
{response && (
AI response:
{response}
)}
);
}
export default VoiceAssistant;
`
๐งช HTML Quick Test
`html
`
๐ API Documentation
$3
The main class for interacting with the SDK.
#### Constructor
`javascript
const agent = new MBZVoiceAgent(options);
`
#### Options
| Option | Type | Default | Description
|-----|-----|-----|-----
| apiUrl | String | Required | The URL of your backend API endpoint
| lang | String | 'en-US' | The language for speech recognition
| speak | Boolean | true | Whether to speak the AI's response
| voiceIndex | Number | 0 | Index of the voice to use for speech synthesis
| pitch | Number | 1.0 | The pitch of the voice (0.1 to 2.0)
| rate | Number | 1.0 | The speed of the voice (0.1 to 10.0)
| volume | Number | 1.0 | The volume of the voice (0.0 to 1.0)
| maxHistory | Number | 3 | Maximum number of Q&A pairs to store in history
#### Methods
| Method | Parameters | Description
|-----|-----|-----|-----
| listen() | None | Start listening for voice input
| stop() | None | Stop listening for voice input
| mute() | None | Mute the voice response
| unmute() | None | Unmute the voice response
| cleanup() | None | Clean up resources and event listeners
| onTranscript(callback) | Function | Set callback for transcript events
| onResponse(callback) | Function | Set callback for AI response events
| onListeningChange(callback) | Function | Set callback for listening state changes
| onError(callback) | Function | Set callback for error events
| getHistory() | None | Get the conversation history
| clearHistory() | None | Clear the conversation history
๐ง Troubleshooting
$3
- Ensure your browser has permission to access the microphone
- Check if your microphone is properly connected and working
- Try using a different browser (Chrome and Edge have the best support)
$3
- Make sure you're using a supported browser (Chrome, Edge, Safari)
- Check your internet connection
- Verify that your site is served over HTTPS (required for production)
$3
- Confirm your backend server is running
- Check for CORS issues (the backend should allow requests from your frontend)
- Verify your API URL is correct in the SDK initialization
$3
- Check if your device's volume is turned on
- Make sure the speak option is set to true
- Try using a different voice by changing the voiceIndex
๐ค Contributing
Contributions are welcome! Here's how you can help:
1. Fork the repository
2. Create a feature branch:
`shellscript
git checkout -b feature/amazing-feature
`
3. Commit your changes:
`shellscript
git commit -m 'Add some amazing feature'
`
4. Push to the branch:
`shellscript
git push origin feature/amazing-feature
`
5. Open a Pull Request
$3
`shellscript
Clone the repository
git clone https://github.com/ProMBZ/mbz-voice-sdk.git
Install dependencies
cd mbz-voice-sdk
npm install
Run development server
npm run dev
Build for production
npm run build
`
๐ Security Notice
This SDK does not use any built-in Gemini key.
๐ You are responsible for adding your own Gemini key to the backend.
Never include your Gemini key in frontend code.
๐งฐ Tools Used
- Frontend:
- JavaScript (SpeechRecognition + TTS APIs)
- localStorage for conversation persistence
- Rollup for bundling
- Backend:
- FastAPI (Python)
- Google Generative AI SDK (Gemini 1.5 Flash)
- Python-dotenv for environment variables
๐ License
MIT ยฉ 2025 โ Developed by Muhammad (MBZ-Voice-SDK)๐ GitHub: @ProMBZ
๐ฌ Support
If you have questions, suggestions, or want to collaborate:๐ง Email: muhammadzohaib1415@gmail.com๐ Portfolio: https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/
---
Made with โค๏ธ by Muhammad
`plaintext
This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand.
``