🎙️ MBZ Voice SDK

> Speak. Think. Respond. Seamlessly.

MBZ-Voice-SDK is a powerful developer tool that enables you to integrate voice input, AI understanding (via Gemini), and spoken responses into any modern web app. Whether you're building a chatbot, AI assistant, or a voice-powered UI — this SDK makes it plug-and-play.

---

📋 Table of Contents

- Features
- Requirements
- Installation
- Backend Setup
- Usage Examples
- API Documentation
- Troubleshooting
- Contributing
- Security Notice
- Tools Used
- License
- Support

---

🔥 Features

✅ Voice Input: Capture user speech via browser microphone using Web Speech API
✅ AI Processing: Gemini-powered AI backend built with FastAPI
✅ Voice Response: Convert AI text responses to spoken words using Web Speech TTS
✅ Audio Controls: Easily toggle mute/unmute functionality
✅ Conversation Memory: Store the last 3 Q&A exchanges using localStorage
✅ Framework Agnostic: Seamlessly integrate with plain JavaScript, React, Vue, or any modern frontend framework
✅ Customizable: Configure language, voice type, and response behavior
✅ Lightweight: Minimal dependencies for optimal performance

💻 Requirements

- Modern web browser with support for:
- Web Speech API (SpeechRecognition)
- Web Speech API (SpeechSynthesis)
- localStorage
- Node.js 14+ (for development)
- Python 3.8+ (for backend)
- Gemini API key from Google AI Studio

📦 Install the SDK

$3

After publishing on npm:

``

bash

npx mbz-voice-sdk init

$3



Here's an enhanced README.md file with more complete details for the MBZ Voice SDK:

markdown

...

$3

shellscript

yarn add mbz-voice-sdk

$3

shellscript

cd mbz-voice-sdk/sdk

npm install

$3

html





⚙️ Backend Setup Guide



This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the

/backend

 folder.



$3

shellscript

cd ../backend

$3

shellscript

pip install -r requirements.txt





$3



Create a

.env

 file in the backend folder and paste your Gemini API key:

plaintext

GEMINI_API_KEY=your_google_gemini_api_key_here





👉 Get your key from: https://makersuite.google.com/app/apikey



$3

shellscript

uvicorn main:app --reload





Now your backend is live at:

plaintext

http://localhost:8000/ask





🧠 SDK Usage Example



$3

javascript

import { MBZVoiceAgent } from "mbz-voice-sdk";



const agent = new MBZVoiceAgent({

  apiUrl: "http://localhost:8000/ask",

  lang: "en-US",

  speak: true

});



agent.onTranscript((text) => {

  console.log("User said:", text);

});



agent.onResponse((reply) => {

  console.log("AI replied:", reply);

});



document.getElementById("start-btn").onclick = () => agent.listen();

$3

javascriptreact

import React, { useEffect, useState } from 'react';

import { MBZVoiceAgent } from 'mbz-voice-sdk';



function VoiceAssistant() {

  const [transcript, setTranscript] = useState('');

  const [response, setResponse] = useState('');

  const [isListening, setIsListening] = useState(false);

  const [agent, setAgent] = useState(null);



  useEffect(() => {

    // Initialize the agent

    const voiceAgent = new MBZVoiceAgent({

      apiUrl: "http://localhost:8000/ask",

      lang: "en-US",

      speak: true

    });



    // Set up event handlers

    voiceAgent.onTranscript((text) => {

      setTranscript(text);

    });



    voiceAgent.onResponse((reply) => {

      setResponse(reply);

    });



    voiceAgent.onListeningChange((listening) => {

      setIsListening(listening);

    });



    setAgent(voiceAgent);



    // Cleanup on unmount

    return () => {

      voiceAgent.cleanup();

    };

  }, []);



  const handleListen = () => {

    if (agent) {

      agent.listen();

    }

  };



  return (

    

      

      

      {transcript && (

        

          You said:

          {transcript}

        

      )}

      

      {response && (

        

          AI response:

          {response}

        

      )}

    

  );

}



export default VoiceAssistant;





🧪 HTML Quick Test

html





📚 API Documentation



$3



The main class for interacting with the SDK.



#### Constructor

javascript

const agent = new MBZVoiceAgent(options);





#### Options



| Option | Type | Default | Description

|-----|-----|-----|-----

|

apiUrl

 | String | Required | The URL of your backend API endpoint

|

lang

 | String | 'en-US' | The language for speech recognition

|

speak

 | Boolean | true | Whether to speak the AI's response

|

voiceIndex

 | Number | 0 | Index of the voice to use for speech synthesis

|

pitch

 | Number | 1.0 | The pitch of the voice (0.1 to 2.0)

|

rate

 | Number | 1.0 | The speed of the voice (0.1 to 10.0)

|

volume

 | Number | 1.0 | The volume of the voice (0.0 to 1.0)

|

maxHistory

 | Number | 3 | Maximum number of Q&A pairs to store in history





#### Methods



| Method | Parameters | Description

|-----|-----|-----|-----

|

listen()

 | None | Start listening for voice input

|

stop()

 | None | Stop listening for voice input

|

mute()

 | None | Mute the voice response

|

unmute()

 | None | Unmute the voice response

|

cleanup()

 | None | Clean up resources and event listeners

|

onTranscript(callback)

 | Function | Set callback for transcript events

|

onResponse(callback)

 | Function | Set callback for AI response events

|

onListeningChange(callback)

 | Function | Set callback for listening state changes

|

onError(callback)

 | Function | Set callback for error events

|

getHistory()

 | None | Get the conversation history

|

clearHistory()

 | None | Clear the conversation history





🔧 Troubleshooting



$3



- Ensure your browser has permission to access the microphone

- Check if your microphone is properly connected and working

- Try using a different browser (Chrome and Edge have the best support)





$3



- Make sure you're using a supported browser (Chrome, Edge, Safari)

- Check your internet connection

- Verify that your site is served over HTTPS (required for production)





$3



- Confirm your backend server is running

- Check for CORS issues (the backend should allow requests from your frontend)

- Verify your API URL is correct in the SDK initialization





$3



- Check if your device's volume is turned on

- Make sure the

speak option is set to true



- Try using a different voice by changing the

voiceIndex







🤝 Contributing



Contributions are welcome! Here's how you can help:



1. Fork the repository

2. Create a feature branch:

shellscript

git checkout -b feature/amazing-feature







3. Commit your changes:

shellscript

git commit -m 'Add some amazing feature'







4. Push to the branch:

shellscript

git push origin feature/amazing-feature







5. Open a Pull Request





$3

shellscript

Clone the repository

git clone https://github.com/ProMBZ/mbz-voice-sdk.git



Install dependencies

cd mbz-voice-sdk

npm install



Run development server

npm run dev



Build for production

npm run build





🔐 Security Notice



This SDK does not use any built-in Gemini key.



🔐 You are responsible for adding your own Gemini key to the backend.



Never include your Gemini key in frontend code.



🧰 Tools Used



- Frontend:



- JavaScript (SpeechRecognition + TTS APIs)

- localStorage for conversation persistence

- Rollup for bundling







- Backend:



- FastAPI (Python)

- Google Generative AI SDK (Gemini 1.5 Flash)

- Python-dotenv for environment variables











📄 License



MIT © 2025 — Developed by Muhammad (MBZ-Voice-SDK)🔗 GitHub: @ProMBZ



💬 Support



If you have questions, suggestions, or want to collaborate:📧 Email: muhammadzohaib1415@gmail.com🌍 Portfolio: https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/



---



Made with ❤️ by Muhammad

plaintext



This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand.

🎙️ MBZ Voice SDK

📋 Table of Contents

- Features
- Requirements
- Installation
- Backend Setup
- Usage Examples
- API Documentation
- Troubleshooting
- Contributing
- Security Notice
- Tools Used
- License
- Support

---

🔥 Features

💻 Requirements

📦 Install the SDK

$3

After publishing on npm:

``

bash

npx mbz-voice-sdk init

$3



Here's an enhanced README.md file with more complete details for the MBZ Voice SDK:

markdown

...

$3

shellscript

yarn add mbz-voice-sdk

$3

shellscript

cd mbz-voice-sdk/sdk

npm install

$3

html





⚙️ Backend Setup Guide



This SDK requires a backend API endpoint connected to Gemini (Google AI). We've provided a ready-to-use FastAPI backend in the

/backend

 folder.



$3

shellscript

cd ../backend

$3

shellscript

pip install -r requirements.txt





$3



Create a

.env

 file in the backend folder and paste your Gemini API key:

plaintext

GEMINI_API_KEY=your_google_gemini_api_key_here





👉 Get your key from: https://makersuite.google.com/app/apikey



$3

shellscript

uvicorn main:app --reload





Now your backend is live at:

plaintext

http://localhost:8000/ask





🧠 SDK Usage Example



$3

javascript

import { MBZVoiceAgent } from "mbz-voice-sdk";



const agent = new MBZVoiceAgent({

  apiUrl: "http://localhost:8000/ask",

  lang: "en-US",

  speak: true

});



agent.onTranscript((text) => {

  console.log("User said:", text);

});



agent.onResponse((reply) => {

  console.log("AI replied:", reply);

});



document.getElementById("start-btn").onclick = () => agent.listen();

$3

javascriptreact

import React, { useEffect, useState } from 'react';

import { MBZVoiceAgent } from 'mbz-voice-sdk';



function VoiceAssistant() {

  const [transcript, setTranscript] = useState('');

  const [response, setResponse] = useState('');

  const [isListening, setIsListening] = useState(false);

  const [agent, setAgent] = useState(null);



  useEffect(() => {

    // Initialize the agent

    const voiceAgent = new MBZVoiceAgent({

      apiUrl: "http://localhost:8000/ask",

      lang: "en-US",

      speak: true

    });



    // Set up event handlers

    voiceAgent.onTranscript((text) => {

      setTranscript(text);

    });



    voiceAgent.onResponse((reply) => {

      setResponse(reply);

    });



    voiceAgent.onListeningChange((listening) => {

      setIsListening(listening);

    });



    setAgent(voiceAgent);



    // Cleanup on unmount

    return () => {

      voiceAgent.cleanup();

    };

  }, []);



  const handleListen = () => {

    if (agent) {

      agent.listen();

    }

  };



  return (

    

      

      

      {transcript && (

        

          You said:

          {transcript}

        

      )}

      

      {response && (

        

          AI response:

          {response}

        

      )}

    

  );

}



export default VoiceAssistant;





🧪 HTML Quick Test

html





📚 API Documentation



$3



The main class for interacting with the SDK.



#### Constructor

javascript

const agent = new MBZVoiceAgent(options);





#### Options



| Option | Type | Default | Description

|-----|-----|-----|-----

|

apiUrl

 | String | Required | The URL of your backend API endpoint

|

lang

 | String | 'en-US' | The language for speech recognition

|

speak

 | Boolean | true | Whether to speak the AI's response

|

voiceIndex

 | Number | 0 | Index of the voice to use for speech synthesis

|

pitch

 | Number | 1.0 | The pitch of the voice (0.1 to 2.0)

|

rate

 | Number | 1.0 | The speed of the voice (0.1 to 10.0)

|

volume

 | Number | 1.0 | The volume of the voice (0.0 to 1.0)

|

maxHistory

 | Number | 3 | Maximum number of Q&A pairs to store in history





#### Methods



| Method | Parameters | Description

|-----|-----|-----|-----

|

listen()

 | None | Start listening for voice input

|

stop()

 | None | Stop listening for voice input

|

mute()

 | None | Mute the voice response

|

unmute()

 | None | Unmute the voice response

|

cleanup()

 | None | Clean up resources and event listeners

|

onTranscript(callback)

 | Function | Set callback for transcript events

|

onResponse(callback)

 | Function | Set callback for AI response events

|

onListeningChange(callback)

 | Function | Set callback for listening state changes

|

onError(callback)

 | Function | Set callback for error events

|

getHistory()

 | None | Get the conversation history

|

clearHistory()

 | None | Clear the conversation history





🔧 Troubleshooting



$3



- Ensure your browser has permission to access the microphone

- Check if your microphone is properly connected and working

- Try using a different browser (Chrome and Edge have the best support)





$3



- Make sure you're using a supported browser (Chrome, Edge, Safari)

- Check your internet connection

- Verify that your site is served over HTTPS (required for production)





$3



- Confirm your backend server is running

- Check for CORS issues (the backend should allow requests from your frontend)

- Verify your API URL is correct in the SDK initialization





$3



- Check if your device's volume is turned on

- Make sure the

speak option is set to true



- Try using a different voice by changing the

voiceIndex







🤝 Contributing



Contributions are welcome! Here's how you can help:



1. Fork the repository

2. Create a feature branch:

shellscript

git checkout -b feature/amazing-feature







3. Commit your changes:

shellscript

git commit -m 'Add some amazing feature'







4. Push to the branch:

shellscript

git push origin feature/amazing-feature







5. Open a Pull Request





$3

shellscript

Clone the repository

git clone https://github.com/ProMBZ/mbz-voice-sdk.git



Install dependencies

cd mbz-voice-sdk

npm install



Run development server

npm run dev



Build for production

npm run build





🔐 Security Notice



This SDK does not use any built-in Gemini key.



🔐 You are responsible for adding your own Gemini key to the backend.



Never include your Gemini key in frontend code.



🧰 Tools Used



- Frontend:



- JavaScript (SpeechRecognition + TTS APIs)

- localStorage for conversation persistence

- Rollup for bundling







- Backend:



- FastAPI (Python)

- Google Generative AI SDK (Gemini 1.5 Flash)

- Python-dotenv for environment variables











📄 License



MIT © 2025 — Developed by Muhammad (MBZ-Voice-SDK)🔗 GitHub: @ProMBZ



💬 Support



If you have questions, suggestions, or want to collaborate:📧 Email: muhammadzohaib1415@gmail.com🌍 Portfolio: https://kzml8bqhnxp4cn0duf08.lite.vusercontent.net/



---



Made with ❤️ by Muhammad

plaintext



This comprehensive README.md file includes all the essential details about the MBZ Voice SDK, including installation instructions, usage examples, API documentation, troubleshooting tips, and contribution guidelines. It's well-structured with clear sections and formatting to make it easy to navigate and understand.