python scripts to convert labelme-generated-jsons to voc/coco style datasets.
Report Bug
·
Request Feature
Table of Contents
Scripts in this repository are used to convert labelme-annotated jsons into standard datasets in PASCAL VOC format or MS COCO format.
Scripts are written in Python.
Most of the scripts refer to the examples section of labelme. Then I add some features according my own dataset, like class name conversion, customise image name, etc.
Attention: these scripts are not complicated, and if you have the basis of python, please go through the convert workflows, and ensure that it fits your datasets. There are some places I annotated MARK
, which means pay attention to it, and you could customize it to fit your needs.
Customize: these scripts are only for the conversion of data I currently have. If you want to convert datasets in other areas, like instance segmentation, segmantic segmentation, video annotation, etc. please take a look at the examples section in labelme.
-
gather the labelme-annotated jsons into a folder. In the next steps, we will refer to this folder as
labelme_jsons_dir
. -
prepare a text file to store class names in your dataset. named it
label_names.txt
. take a look attest/label_names.txt
for an example. -
if need class name conversion, prepare a text file to store the conversion rules. named it
label_dict.txt
. take a look attest/label_dict.txt
for an example.
-
suggested to use virtualenv to install python packages.
conda create --name=labelme python=3.9 conda activate labelme pip install -r requirements.txt
-
clone the repo.
git clone git@github.com:veraposeidon/labelme2Datasets.git
-
install the package
cd labelme2Datasets # (prefer this way!) install in editable mode, so that you can modify the package pip install -e . # install in non-editable mode, so that you can use the package, but cannot modify it #python setup.py install
I also published a PyPI package named labelme2datasets.
you can just use pip3 install labelme2datasets
to install this package.
if the baseline in this project not work for your datasets, you can install in develop mode, and modify the code by your own.
-
convert a single json into dataset. (
labelme_json2dataset.py
)labelme_json2dataset --json_file=data/sample.json \ --output_dir=output/test_single_output
-
convert a folder of jsons into voc-format dataset. (
labelme_bbox_json2voc.py
)- without label conversion
labelme_bbox_json2voc --json_dir=data/sample_jsons \ --output_dir=output/test_voc_output --labels data/label_names.txt
- with label conversion
labelme_bbox_json2voc --json_dir=data/sample_jsons \ --output_dir=output/test_voc_output \ --labels data/label_names.txt \ --label_dict data/label_dict.txt
- without label conversion
-
splitting voc datasets into train set and test set. (
split_voc_datasets.py
)split_voc_datasets --voc_dir output/test_voc_output --test_ratio 0.3 --random_seed 42
train.txt
andtest.txt
should be generated invoc_dir/ImageSets/Main/
. -
turn voc format dataset into coco style dataset. (
voc2coco.py
)voc2coco --voc_dir output/test_voc_output --coco_dir output/test_coco_output
- add all scripts with pylint passed
- chinese and english readme
- modify project architecture
- publish as package
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
veraposeidon - veraposeidon@gmail.com
Project Link: https://github.com/veraposeidon/labelme2Datasets