Dataset Information

Dataset Information for Realtime Sign Language Detection Using LSTM Model

Overview

The Realtime Sign Language Detection Using LSTM Model doesn't rely on a pre-recorded dataset. Instead, the dataset is generated dynamically by capturing keypoints of sign language gestures using a camera. These keypoints represent landmarks detected by the MediaPipe framework (face, hand, pose, etc.), and they are then used to train the model for recognizing and interpreting gestures in real-time.

Dataset Creation Process

Step 1: Define Actions to Capture

The first step involves defining the sign language actions that you want the model to recognize. In the code, this is done by specifying the actions as follows:

actions = np.array(['cat', 'food', 'help'])

This array defines the sign language gestures (or "actions") that will be captured from the user's input. These actions correspond to specific gestures such as "cat," "food," and "help."

Step 2: Set Up Folders for Keypoints

The next step involves setting up folders where the keypoints (or landmarks) will be captured. These keypoints are the data points that the model detects from the camera feed. Each gesture will have its own folder for storing the keypoints that correspond to it.

signs = ['cat','food','help']

Step 3: Capture Keypoints for Training

In this phase, keypoints are captured from the camera feed and stored as NumPy arrays (.npy files). These keypoints are crucial for training the model. The program captures 30 frames of keypoint data per gesture, with each frame containing 1662 keypoints, resulting in a dataset of sequential data that represents the gesture.

Step 4: Train and Test Split

Once the keypoints have been captured, the data is divided into training and testing sets. A 95%/5% split is used, meaning that 95% of the data is used for training and 5% is used for testing the model’s accuracy.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)

Step 5: Model Training

The model is then trained using the captured keypoints and their corresponding labels (the sign language gestures). The LSTM model processes the sequential keypoint data, learning to recognize patterns in the hand movements that correspond to each gesture.

model.save('./model.h5')
model.save_weights('./model_weights.h5')

Step 6: Model Inference

Once the model is trained, you can perform inference on new data. There are two options for inference:

Continuous Inference: If you’re continuously running the same process to perform inference, you’ll need to train the model with your own dataset and use the relevant code.
Pre-trained Model Inference: If you're simply loading a pre-trained model for inference, you can bypass the training process and directly use the saved model for detecting gestures in real-time.

Flexibility of the Model

This model is designed with flexibility in mind. You can modify the actions, add new gestures, and retrain the model with your own dataset. The system is built to be adaptable, allowing for easy customization to suit different languages or sign language gestures.

Contributions and Improvements

The project is open to contributions. If you find ways to improve the model (e.g., through better performance, new features, or improved documentation), feel free to fork the repository, make the changes, and submit a pull request.

Note from the Developer

The project was developed during the creator's master's thesis and has been maintained as an open-source project. While the current implementation works as expected, the developer welcomes contributions and is open to improvements from the community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly