URGENT: Training with multiple datasets for text detector #14580
Replies: 1 comment 1 reply
-
To address your queries about modifying the configuration for multiple datasets and dynamically generating datasets during training, let me provide detailed guidance: 1. How to modify the config file to use multiple datasets during training?In PaddleOCR, you can train using multiple datasets by specifying them in the Changes to Your Config File:
Example: Train:
dataset:
name: SimpleDataSet
data_dir: /home/jovyan/01_Paddle
label_file_list:
- /home/jovyan/01_Paddle/dataset1_labels.txt
- /home/jovyan/01_Paddle/dataset2_labels.txt
- /home/jovyan/01_Paddle/dataset3_labels.txt
ratio_list:
- 0.5 # 50% from dataset1
- 0.3 # 30% from dataset2
- 0.2 # 20% from dataset3
transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- IaaAugment:
augmenter_args:
- type: Fliplr
args:
p: 0.5
- type: Affine
args:
rotate:
- -10
- 10
- type: Resize
args:
size:
- 0.8
- 1.5
- EastRandomCropData:
size:
- 960
- 960
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- threshold_map
- threshold_mask
- shrink_map
- shrink_mask
loader:
shuffle: true
drop_last: false
batch_size_per_card: 16
num_workers: 4 In this setup:
2. How to dynamically generate datasets during training (e.g., 2000 new datasets per epoch)?To achieve dynamic dataset generation, especially for data augmentation during training, you can use custom data pipeline scripts or leverage built-in augmentation techniques. Steps to Implement Dynamic Dataset Generation:
Example: transforms:
- DecodeImage:
img_mode: BGR
channel_first: false
- DetLabelEncode: null
- IaaAugment:
augmenter_args:
- type: Fliplr
args:
p: 0.5
- type: Affine
args:
rotate:
- -10
- 10
- type: Resize
args:
size:
- 0.5
- 2.0
- EastRandomCropData:
size:
- 960
- 960
max_tries: 50
keep_ratio: true
- MakeBorderMap:
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
- MakeShrinkMap:
shrink_ratio: 0.4
min_text_size: 8
- NormalizeImage:
scale: 1./255.
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
order: hwc
- ToCHWImage: null
- KeepKeys:
keep_keys:
- image
- threshold_map
- threshold_mask
- shrink_map
- shrink_mask
Example of a Python dataset class: import random
from paddle.io import Dataset
class DynamicTextDataset(Dataset):
def __init__(self, base_data, augmentations, num_samples):
self.base_data = base_data
self.augmentations = augmentations
self.num_samples = num_samples
def __len__(self):
return self.num_samples
def __getitem__(self, idx):
# Select a random base sample
sample = random.choice(self.base_data)
# Apply augmentations
for aug in self.augmentations:
sample = aug(sample)
return sample You can integrate this dataset class into your training pipeline by replacing the default
Additional Tips:
Conclusion:
Let me know if you need further clarification! Response generated by feifei-bot | chatgpt-4o-latest |
Beta Was this translation helpful? Give feedback.
-
Dear Paddle Community,
I am currently performing fine-tuning and I have used data augmentation during training but I have not been able to get good results for both DB, DB++.
I currently possess 2000 data points for training the text detector.
Question:
I have shared my current config file for perusal.
Thanks for reading this!
I humbly request guidance from @GreatV @WenmuZhou @LDOUBLEV @MissPenguin @tink2123 @UserWangZz and others ........ for guidance.
Global:
debug: false
use_gpu: true
epoch_num: 500
log_smooth_window: 20
print_batch_step: 10
save_model_dir: /home/jovyan/01_Paddle/04_Paddle_Models/03_detector_pretrained/01_DB_Default_Det/v32_train
save_epoch_step: 100
eval_batch_step:
cal_metric_during_train: false
pretrained_model: /home/jovyan/01_Paddle/04_Paddle_Models/03_detector_pretrained/01_DB_Default_Det/en_PP-OCRv3_det_distill_train/student.pdparams
checkpoints: null
save_inference_dir: null
use_visualdl: false
infer_img: null
save_res_path: null
distributed: true
Architecture:
freeze_params: true
model_type: det
algorithm: DB
Transform: null
Backbone:
name: MobileNetV3
scale: 0.5
model_name: large
disable_se: true
Neck:
name: RSEFPN
out_channels: 96
shortcut: true
Head:
name: DBHead
k: 50
Loss:
name: DBLoss
balance_loss: true
main_loss_type: DiceLoss
alpha: 5
beta: 10
ohem_ratio: 3
Optimizer:
name: Adam
beta1: 0.9
beta2: 0.999
lr:
name: Cosine
learning_rate: 0.0005
warmup_epoch: 5
regularizer:
name: L2
factor: 0.0001
PostProcess:
name: DBPostProcess
thresh: 0.3
box_thresh: 0.6
max_candidates: 1000
unclip_ratio: 1.5
Metric:
name: DetMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: /home/jovyan/01_Paddle
label_file_list:
ratio_list:
transforms:
img_mode: BGR
channel_first: false
augmenter_args:
args:
p: 0.5
args:
rotate:
args:
size:
size:
max_tries: 50
keep_ratio: true
shrink_ratio: 0.4
thresh_min: 0.3
thresh_max: 0.7
shrink_ratio: 0.4
min_text_size: 8
scale: 1./255.
mean:
std:
order: hwc
keep_keys:
loader:
shuffle: true
drop_last: false
batch_size_per_card: 16
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: /home/jovyan/01_Paddle
label_file_list:
transforms:
img_mode: BGR
channel_first: false
scale: 1./255.
mean:
std:
order: hwc
keep_keys:
loader:
shuffle: false
drop_last: false
batch_size_per_card: 1
num_workers: 2
profiler_options: null
Beta Was this translation helpful? Give feedback.
All reactions