Skip to content

Python library to control drone and other intelligent functions.

Notifications You must be signed in to change notification settings


Repository files navigation



DroneBuddy lib can be used as helper library to program your own drone. this is a offline library, so you can use it without internet connection, which is required when you are connecting with Tello drone.

The compleete documentation can be found at Drone Buddy documentation

Installation Guide


DroneBuddy envisions empowering everyone with the ability to personally program their intelligent drones, enriching them with desired features. At its core, DroneBuddy offers a suite of fundamental building blocks, enabling users to seamlessly integrate these elements to bring their drone to flight.

Functioning as an intuitive interface, DroneBuddy simplifies complex algorithms, stripping away the intricacies to offer straightforward input-output modalities. This approach ensures that users can accomplish their objectives efficiently, without getting bogged down in technical complexities. With DroneBuddy, the focus is on user-friendliness and ease of use, making drone programming accessible and hassle-free.


DroneBuddy behaves as any other python library. You can find the library at and install using pip.

pip install dronebuddylib

The installation of DroneBuddy needs the following prerequisites:

  1. Python 3.9 or higher
  2. Compatible pip version


Running pip install dronebuddylib will only install the drone buddy library, with only the required dependencies which are:

  • requests
  • numpy
  • cython
  • setuptools
  • packaging
  • pyparsing

Face Recognition

Face-recognition is an open-source Python library that provides face detection, face alignment, and face recognition capabilities. The official documentation can be found here.


The face_recognition requires the following pre-requisites:

  1. dlib

dlib Installation

To install dlib, you need to ensure that you meet the following specifications:

  • Operating System: dlib is compatible with Windows, macOS, and Linux operating systems.
  • Python Version: dlib works with Python 2.7 or Python 3.x versions.
  • Compiler: You need a C++ compiler to build and install dlib. For Windows, you can use Microsoft Visual C++ (MSVC) or MinGW. On macOS, Xcode Command Line Tools are required. On Linux, the GNU C++ Compiler (g++) is typically used.
  • Dependencies: dlib relies on a few external dependencies, including Boost and CMake. These dependencies need to be installed beforehand to successfully build dlib.

The official installation instructions are found here.

  • To install the library, first, you need to install the dlib library. Installation instructions are here:
    1. Download CMake windows installer from here.
    2. While installing CMake select "Add CMake to the system PATH" to avoid any error in the next steps.
    3. Install Visual C++, if not installed previously.

Then run the following commands to install the face_recognition:

  • cmake installation
    pip install cmake
  • dlib installation
    pip install dlib
  • face_recognition
macOS Installation

macOS installation is pretty straightforward.

pip install face_recognition


Add Faces to the Memory

In order to proceed with the face recognition, the algorithm needs encodings of the known faces. The library has a method that is specifically designed to add these faces to the memory.

engine_configs = EngineConfigurations({})
image = cv2.imread('test_clear.jpg')
engine = FaceRecognitionEngine(FaceRecognitionAlgorithm.FACE_RECC, engine_configs)
result = engine.remember_face(image, "Jane")

You can check if the images and names are added to the library by simply going to the location where the library is installed.


Recognize Faces

engine_configs = EngineConfigurations({})
image = cv2.imread('test_jane.jpg')
engine = FaceRecognitionEngine(FaceRecognitionAlgorithm.FACE_RECC, engine_configs)
result = engine.recognize_face(image)


The output will be a list of names, if no people are spotted in the frame empty list will be returned. If people are spotted but not recognized, 'unknown' will be added as a list item.


Voice Generation

Pyttsx3 Voice Generation

pyttsx3 is a Python library that provides a simple and convenient interface for performing text-to-speech synthesis. It allows you to convert text into spoken words using various speech synthesis engines available on your system. The official documentation can be found here.


To install pyttsx3 Integration, run the following snippet, which will install the required dependencies:

pip install dronebuddylib[SPEECH_GENERATION]


engine_configs = EngineConfigurations({})
engine = SpeechGenerationEngine(, engine_configs)
result = engine.read_phrase("Read aloud phrase")

Object Detection

Mediapipe Object Detection

The official documentation for Mediapipe can be found here.


To install Mediapipe Integration, run the following snippet, which will install the required dependencies:

pip install dronebuddylib[OBJECT_DETECTION_MP]


The Mediapipe integration module requires no configurations to function.

Code Example

engine_configs = EngineConfigurations({})
engine = MPObjectDetectionImpl(EngineConfigurations({}))
detected_objects = engine.get_detected_objects(mp_image)


The output will be given in the following JSON format:

  "message": "",
  "result": {
    "object_names": [
    "detected_objects": [
        "detected_categories": [
            "category_name": "",
            "confidence": 0
        "bounding_box": {
          "origin_x": 0,
          "origin_y": 0,
          "width": 0,
          "height": 0

YOLO Object Detection

The official documentation for YOLO can be found here.


To install YOLO Integration, run the following snippet, which will install the required dependencies:

pip install dronebuddylib[OBJECT_DETECTION_YOLO]


The YOLO integration module requires the following configurations to function:

  • OBJECT_DETECTION_YOLO_VERSION - This refers to the model that you want to use for detection purposes. The list of versions can be found here.

Code Example

image = cv2.imread('test_image.jpg')

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.OBJECT_DETECTION_YOLO_VERSION, "")
engine = ObjectDetectionEngine(VisionAlgorithm.YOLO, engine_configs)
objects = engine.get_detected_objects(image)


The output will be given in the following JSON format:

  "message": "",
  "result": {
    "object_names": [
    "detected_objects": [
        "detected_categories": [
            "category_name": "",
            "confidence": 0
        "bounding_box": {
          "origin_x": 0,
          "origin_y": 0,
          "width": 0,
          "height": 0

Voice Recognition

Multi Algorithm Recognition

Built on a third-party library. The official documentation for vosk can be found here. The library performs well in multi-thread environments.

Officially Supported Algorithms

  • CMU Sphinx (works offline)
  • Google Speech Recognition
  • Google Cloud Speech API
  • Microsoft Azure Speech
  • Microsoft Bing Voice Recognition (Deprecated)
  • Houndify API
  • IBM Speech to Text
  • Snowboy Hotword Detection (works offline)
  • TensorFlow
  • Vosk API (works offline)
  • OpenAI Whisper (works offline)
  • Whisper API


To install Google Integration, run the following snippet, which will install the required dependencies:

pip install dronebuddylib[SPEECH_RECOGNITION_MULTI]


The Google integration module requires the following configurations to function:

Required Configurations

  • SPEECH_RECOGNITION_MULTI_ALGO_ALGORITHM_NAME - The name of the algorithm you wish to use.

Optional Configurations

  • SPEECH_RECOGNITION_MULTI_ALGO_ALGO_MIC_TIMEOUT - The maximum number of seconds the microphone listens before timing out.
  • SPEECH_RECOGNITION_MULTI_ALGO_ALGO_PHRASE_TIME_LIMIT - The maximum duration for a single phrase before cutting off.
  • SPEECH_RECOGNITION_MULTI_ALGO_IBM_KEY - The IBM API key for using IBM speech recognition.

Code Example

engine_configs = EngineConfigurations({})
engine = SpeechRecognitionEngine(SpeechRecognitionAlgorithm.MULTI_ALGO_SPEECH_RECOGNITION, engine_configs)

result = engine.recognize_speech(audio_steam=data)

How to Use with the Mic

engine_configs = EngineConfigurations({})
engine = SpeechRecognitionEngine(SpeechRecognitionAlgorithm.MULTI_ALGO_SPEECH_RECOGNITION, engine_configs)

while True:
    with speech_microphone as source:
            result = engine.recognize_speech(source)
            if result.recognized_speech is not None:
                intent = recognize_intent_gpt(intent_engine, result.recognized_speech)
                execute_drone_functions(intent, drone_instance, face_recognition_engine, object_recognition_engine,
                                        text_recognition_engine, voice_engine)
                logger.log_warning("TEST", "Not Recognized: voice ")
        except speech_recognition.WaitTimeoutError:
        time.sleep(1)  # Sleep to simulate work and prevent a tight loop


The output will be given in the following JSON format:

    "recognized_speech": "",
    "total_billed_time": ""


  • recognized_speech - Text with the recognized speech.
  • total_billed_time - If a paid service, the billed time.

Google Voice Recognition

The official documentation for Google Speech-to-Text can be found here. Follow the steps to create the cloud console.

Steps for Usage

  1. Installation: To use Google Speech Recognition, you first need to set up the Google Cloud environment and install necessary SDKs or libraries in your development environment.
  2. API Key and Setup: Obtain an API key from Google Cloud and configure it in your application. This key is essential for authenticating and accessing Google’s speech recognition services.
  3. Audio Input and Processing: Your application should be capable of capturing audio input, which can be sent to Google’s speech recognition service. The audio data needs to be in a format compatible with Google’s system.
  4. Handling the Output: Once Google processes the audio, it returns a text transcription. This output can be used in various ways, such as command interpretation, text analysis, or as input for other systems.
  5. Customization: Google Speech Recognition allows customization for specific vocabulary or industry terms, enhancing recognition accuracy for specialized applications.


To install Google Integration, run the following snippet, which will install the required dependencies:

pip install dronebuddylib[SPEECH_RECOGNITION_GOOGLE]


The Google integration module requires the following configurations to function:


Code Example

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_SAMPLE_RATE_HERTZ, 44100)
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_LANGUAGE_CODE, "en-US")
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_ENCODING, "LINEAR16")

engine = SpeechToTextEngine(SpeechRecognitionAlgorithm.GOOGLE_SPEECH_RECOGNITION, engine_configs)
result = engine.recognize_speech(audio_steam=data)

How to Use with the Mic

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_SAMPLE_RATE_HERTZ, 44100)
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_LANGUAGE_CODE, "en-US")
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_GOOGLE_ENCODING, "LINEAR16")

engine = SpeechToTextEngine(SpeechRecognitionAlgorithm.GOOGLE_SPEECH_RECOGNITION, engine_configs)

with sr.Microphone() as source:
    print("Listening for commands...")
    audio = recognizer.listen(source)

        # Recognize speech using Google Speech Recognition
        command = engine.recognize_speech(audio)
        print(f"Recognized command: {command}")

        # Process and execute the command
    except e:


The output will be given in the following JSON format:

    "recognized_speech": "",
    "total_billed_time": ""


  • recognized_speech - Text with the recognized speech.
  • total_billed_time - If a paid service, the billed time.

VOSK Voice Recognition

The official documentation for VOSK can be found here.


To install VOSK Integration, run the following snippet, which will install the required dependencies:

pip install dronebuddylib[SPEECH_RECOGNITION_VOSK]


The VOSK integration module requires the following configurations to function:

  • SPEECH_RECOGNITION_VOSK_LANGUAGE_MODEL_PATH - This is the path to the model that you have downloaded. This is a compulsory parameter if you are using any other language. If this is not provided, the default model will be used. The default model is the English model (vosk-model-small-en-us-0.15). VOSK supported languages can be found here.

Code Example

engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_VOSK_LANGUAGE_MODEL_PATH, "0.7")

engine = SpeechToTextEngine(SpeechRecognitionAlgorithm.VOSK_SPEECH_RECOGNITION, engine_configs)
result = engine.recognize_speech(audio_steam=data)

How to Use with the Mic

import pyaudio
from dronebuddylib.atoms.speechrecognition.speech_to_text_engine import SpeechToTextEngine
from dronebuddylib.models.engine_configurations import EngineConfigurations
from dronebuddylib.models.enums import Configurations, SpeechRecognitionAlgorithm

mic = pyaudio.PyAudio()

# initialize speech to text engine
engine_configs = EngineConfigurations({})
engine_configs.add_configuration(Configurations.SPEECH_RECOGNITION_VOSK_LANGUAGE_MODEL_PATH, "C:/users/project/resources/speechrecognition/vosk-model-small-en-us-0.15")

engine = SpeechToTextEngine(SpeechRecognitionAlgorithm.VOSK_SPEECH_RECOGNITION, engine_configs)

# this method receives the audio input from pyaudio and returns the command
def get_command():
    listening = True
    stream =, channels=1, rate=44100, input=True, frames_per_buffer=8192)

    while listening:
            # chunks the audio stream to a byte stream
            data =
            recognized = engine.recognize_speech(audio_steam=data)
            if recognized is not None:
                listening = False
                return recognized
        except Exception as e:


The output will be given in the following JSON format:

    "recognized_speech": "",
    "total_billed_time": ""


  • recognized_speech - Text with the recognized speech.
  • total_billed_time - If a paid service, the billed time, but for VOSK this will be empty.

Text Recognition Module Installation

Currently, DroneBuddy supports several algorithms for text recognition:

  1. pyttsx3 - Offline package

To use each of these, you can customize the installation according to your needs.


Google Vision Integration

For integrating Google Vision into DroneBuddy for text recognition capabilities, please follow the specific instructions outlined in the "Google Vision Integration" guide. This module allows for robust text detection and recognition functionalities leveraging Google's cloud-based vision APIs.

For detailed installation and usage instructions, refer to the separate guide dedicated to Google Vision Integration within DroneBuddy's documentation.

(Note: The actual content and commands for the "google_text_rec_installation_guide" are not provided, hence not included in this Markdown conversion.)


Python library to control drone and other intelligent functions.






No releases published


No packages published

Contributors 3

