-
-
Notifications
You must be signed in to change notification settings - Fork 9
Dataset Information
The Realtime Sign Language Detection Using LSTM Model doesn't rely on a pre-recorded dataset. Instead, the dataset is generated dynamically by capturing keypoints of sign language gestures using a camera. These keypoints represent landmarks detected by the MediaPipe framework (face, hand, pose, etc.), and they are then used to train the model for recognizing and interpreting gestures in real-time.
The first step involves defining the sign language actions that you want the model to recognize. In the code, this is done by specifying the actions as follows:
actions = np.array(['cat', 'food', 'help'])
This array defines the sign language gestures (or "actions") that will be captured from the user's input. These actions correspond to specific gestures such as "cat," "food," and "help."
The next step involves setting up folders where the keypoints (or landmarks) will be captured. These keypoints are the data points that the model detects from the camera feed. Each gesture will have its own folder for storing the keypoints that correspond to it.
signs = ['cat','food','help']
In this phase, keypoints are captured from the camera feed and stored as NumPy arrays (.npy
files). These keypoints are crucial for training the model. The program captures 30 frames
of keypoint data per gesture, with each frame containing 1662 keypoints
, resulting in a dataset of sequential data that represents the gesture.
Once the keypoints have been captured, the data is divided into training and testing sets. A 95%/5%
split is used, meaning that 95%
of the data is used for training and 5%
is used for testing the model’s accuracy.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.05)
The model is then trained using the captured keypoints and their corresponding labels (the sign language gestures). The LSTM model processes the sequential keypoint data, learning to recognize patterns in the hand movements that correspond to each gesture.
model.save('./model.h5')
model.save_weights('./model_weights.h5')
Once the model is trained, you can perform inference on new data. There are two options for inference:
- Continuous Inference: If you’re continuously running the same process to perform inference, you’ll need to train the model with your own dataset and use the relevant code.
- Pre-trained Model Inference: If you're simply loading a pre-trained model for inference, you can bypass the training process and directly use the saved model for detecting gestures in real-time.
This model is designed with flexibility in mind. You can modify the actions, add new gestures, and retrain the model with your own dataset. The system is built to be adaptable, allowing for easy customization to suit different languages or sign language gestures.
The project is open to contributions. If you find ways to improve the model (e.g., through better performance, new features, or improved documentation), feel free to fork the repository, make the changes, and submit a pull request.
The project was developed during the creator's master's thesis and has been maintained as an open-source project. While the current implementation works as expected, the developer welcomes contributions and is open to improvements from the community.