You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
我现在是用您这个deep speaker的代码在一个1080的gpu上跑voxceleb1的分,training运行有12小时了,现在是运行到20260steps: [INFO] t1.py/main | == Presenting step #20260 [INFO] t1.py/main | == Processed in 0.22s by the network, training loss = 1.564599871635437. Found 0146962 files with 01211 different speakers. [INFO] t1.py/main | test training data EER = 0.254, F-measure = 0.152, Accuracy = 0.960 get batch time 2.86e-06s forward process time 0.419s beginning to select.......... select best batch time 0.052s select_batch_time: 0.47753071784973145
training loss从一开始的7到现在的2到5之间跳动.看上去和您在readme.md上的loss曲线差不多,应该到80000steps时loss和eer会更好吧?
2020-09-22 update:
今天我用librispeech的train-clean-100 和test-clean重新run了一遍
training loss和eer就肉眼可见的喜人。
现在才26500步,就已经loss变成1左右。eer0.07左右了。比之前train voxceleb的效果好太多了。
[INFO] train.py/main | == Presenting step #26500
[INFO] train.py/main | == Processed in 0.22s by the network, training loss = 1.1397817134857178.
Found 0028539 files with 00251 different speakers.
[INFO] train.py/main | test training data EER = 0.073, F-measure = 0.679, Accuracy = 0.988
get batch time 2.62e-06s
forward process time 0.389s
beginning to select..........
select best batch time 0.065s
select_batch_time: 0.46043896675109863
[INFO] train.py/main | == Presenting step #26501
[INFO] train.py/main | == Processed in 0.22s by the network, training loss = 1.0623468160629272.
get batch time 7.15e-06s
forward process time 0.403s
beginning to select..........
select best batch time 0.0648s
select_batch_time: 0.4749116897583008
帅哥您好,
谢谢您为speaker recognition领域做的贡献,学习您的这个代码快一个礼拜了,受益匪浅。代码内容和您对issues的维护对我这个小白很友好,学起来没有啃教材那么费劲。
也谢谢提供了在这个issue板块供和谐积极的讨论氛围。
我现在是用您这个deep speaker的代码在一个1080的gpu上跑voxceleb1的分,training运行有12小时了,现在是运行到20260steps:
[INFO] t1.py/main | == Presenting step #20260 [INFO] t1.py/main | == Processed in 0.22s by the network, training loss = 1.564599871635437. Found 0146962 files with 01211 different speakers. [INFO] t1.py/main | test training data EER = 0.254, F-measure = 0.152, Accuracy = 0.960 get batch time 2.86e-06s forward process time 0.419s beginning to select.......... select best batch time 0.052s select_batch_time: 0.47753071784973145
training loss从一开始的7到现在的2到5之间跳动.看上去和您在readme.md上的loss曲线差不多,应该到80000steps时loss和eer会更好吧?
我的一个疑问是:我电脑是是4块gpu,我是用
CUDA_VISIBLE_DEVICES=“0,1,2,3” python3 train.py
来调用gpu的,但是不知道为什么真正在运行的gpu就只有一块,另外三块中有python3进程,但是就是不怎么在运行。(以前我在TensorFlow下运行其他程序,用同样的命令4开gpu没有问题。)所以是在这个代码里我应该再加上调用多gpu的命令吗?另外一个想请教的问题就是:我想做基于timit和voxceleb双数据库的说话人识别,timit算是比较干净没有噪音的语料库,voxceleb是带真实噪音的in wild类型的语料库。我的问题是:不同语料库之间,它们的信道应该是不一样的吧(如果我说的不对请指出)?所以我做这个双语料库的时候需要考虑这个信道问题吗?我阅读了一些用多语料库做说话人识别的(如swb+timit+librispeech等)都没有指出这个不同语料库之间可能存在的信道不同问题。
如果不考虑信道问题的话,比如,我用timit的trainset和voxceleb的trainset作为training set,voxceleb里的testset作为enroll和test,最终出来的eer会具有说服力和可解释性吗?
祝好
谢谢
========================
9-21现在update一下:
现在已经进行到近70000steps了,training loss是在1到2之前跳动。eer是0.17左右。
这个loss还在1以上,是不是意味着这个训练对这一套数据不是很有效?是不是意味着这次的训练并没有很好的学习到特征,所以eer还是很高?
2020-09-22 update:
今天我用librispeech的train-clean-100 和test-clean重新run了一遍
training loss和eer就肉眼可见的喜人。
现在才26500步,就已经loss变成1左右。eer0.07左右了。比之前train voxceleb的效果好太多了。
[INFO] train.py/main | == Presenting step #26500
[INFO] train.py/main | == Processed in 0.22s by the network, training loss = 1.1397817134857178.
Found 0028539 files with 00251 different speakers.
[INFO] train.py/main | test training data EER = 0.073, F-measure = 0.679, Accuracy = 0.988
get batch time 2.62e-06s
forward process time 0.389s
beginning to select..........
select best batch time 0.065s
select_batch_time: 0.46043896675109863
[INFO] train.py/main | == Presenting step #26501
[INFO] train.py/main | == Processed in 0.22s by the network, training loss = 1.0623468160629272.
get batch time 7.15e-06s
forward process time 0.403s
beginning to select..........
select best batch time 0.0648s
select_batch_time: 0.4749116897583008
我猜测原因可能是:librispeech是干净无噪音的语料库,voxceleb是带环境噪音和可能有远场的语料库。deep speaker可能对干净的语音数据效果更强。
The text was updated successfully, but these errors were encountered: