Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variational bayes speaker diarization #3

Open
wants to merge 245 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
245 commits
Select commit Hold shift + click to select a range
e8d1287
[src] Fix 'sausage-time' issue which occurs with disabled MBR decodin…
KarelVesely84 Jan 18, 2019
99dc4d8
[egs] Add scripts for yomdle Russian (OCR task) (#2953)
aarora8 Jan 21, 2019
7e529ed
[egs] Simplify lexicon preparation in Fisher callhome Spanish (#2999)
GoVivace Jan 21, 2019
25f09e8
[egs] Update GALE Arabic recipe (#2934)
aarora8 Jan 22, 2019
4338004
[egs] Remove outdated NN results from Gale Arabic recipe (#3002)
aarora8 Jan 22, 2019
05d9a3d
[egs] Add RESULTS file for the tedlium s5_r3 (release 3) setup (#3003)
huangruizhe Jan 23, 2019
1dcdf80
[src] Fixes to grammar-fst code to handle LM-disambig symbols properl…
danpovey Jan 26, 2019
6f56512
[src] Cosmetic change to mel computation (fix option string) (#3011)
boeddeker Jan 30, 2019
56cfb95
[src] Fix Visual Studio error due to alternate syntactic form of nore…
daanzu Feb 1, 2019
9e35898
[egs] Fix location of sequitur installation (#3017)
jybaek Feb 1, 2019
a51bd96
[src] Fix w/ ifdef Visual Studio error from alternate syntactic form …
daanzu Feb 3, 2019
41ea8cf
[egs] Some fixes to getting data in heroico recipe (#3021)
danpovey Feb 3, 2019
fb514dc
[egs] BABEL script fix: avoid make_L_align.sh generating invalid file…
jtrmal Feb 4, 2019
afc5e78
[src] Fix to older online decoding code in online/ (OnlineFeInput; wa…
jdieguez Feb 6, 2019
226cbf7
[script] Fix unset bash variable in make_mfcc.sh (#3030)
oplatek Feb 8, 2019
6fc4c60
[scripts] Extend limit_num_gpus.sh to support --num-gpus 0. (#3027)
oplatek Feb 8, 2019
2f92bd9
[scripts] fix bug in utils/add_lex_disambig.pl when sil-probs and pro…
Teddyang Feb 15, 2019
403c5ee
[egs] Fix path in Tedlium r3 rnnlm training script (#3039)
francoishernandez Feb 18, 2019
abfbc56
[src] Thread-safety for GrammarFst (thx:armando.muscariello@gmail.com…
danpovey Feb 20, 2019
f09d48a
[scripts] Cosmetic fix to get_degs.sh (#3045)
Teddyang Feb 21, 2019
b0fc09d
[egs] Small bug fixes for IAM and UW3 recipes (#3048)
ChunChiehChang Feb 21, 2019
4494a85
[scripts] Nnet3 segmentation: fix default params (#3051)
danpovey Feb 26, 2019
bf33f1f
[scripts] Allow perturb_data_dir_speed.sh to work with utt2lang (#3055)
igrinis Feb 26, 2019
5f05d59
[scripts] Make beam in monophone training configurable (#3057)
xiaohui-zhang Feb 27, 2019
c0a555e
[scripts] Allow reverberate_data_dir.py to support unicode filenames …
rezame Feb 27, 2019
2e26464
[scripts] Make some cleanup scripts work with python3 (#3054)
vimalmanohar Mar 1, 2019
d21be2d
[scripts] bug fix to nnet2->3 conversion, fixes #886 (#3071)
jfainberg Mar 4, 2019
8fa9648
[src] Make copies occur in per-thread default stream (for GPUs) (#3068)
luitjens Mar 4, 2019
bd326dc
[src] Add GPU version of MergeTaskOutput().. relates to batch decodin…
luitjens Mar 4, 2019
17b7f3f
[src] Add device options to enable tensor core math mode. (#3066)
luitjens Mar 4, 2019
0a1f827
[src] Log nnet3 computation to VLOG, not std::cout (#3072)
kkm000 Mar 5, 2019
f2a89c2
[src] Allow upsampling in compute-mfcc-feats, etc. (#3014)
danpovey Mar 5, 2019
98b45c8
[src] fix problem with rand_r being undefined on Android (#3037)
keli78 Mar 5, 2019
197214d
[egs] Update swbd1_map_words.pl, fix them_1's -> them's (#3052)
Mar 5, 2019
991a75c
[src] Add const overload OnlineNnet2FeaturePipeline::IvectorFeature (…
kkm000 Mar 6, 2019
4432371
[src] Fix syntax error in egs/bn_music_speech/v1/local/make_musan.py …
antonstakhouski Mar 6, 2019
8460fa3
[src] Memory optimization for online feature extraction of long recor…
pzelasko Mar 6, 2019
b801b98
[build] fixed a bug in linux_configure_redhat_fat when use_cuda=no (#…
kan-bayashi Mar 7, 2019
ce97c47
[scripts] Add missing '. ./path.sh' to get_utt2num_frames.sh (#3076)
hhadian Mar 7, 2019
4d61452
[src,scripts,egs] Add count-based biphone tree tying for flat-start c…
hhadian Mar 7, 2019
01cef69
[scripts,egs] Remove sed from various scripts (avoid compatibility pr…
desh2608 Mar 8, 2019
2f95609
[src] Rework error logging for safety and cleanliness (#3064)
kkm000 Mar 8, 2019
bcfe3f8
[src] Change warp-synchronous to cub::BlockReduce (safer but slower) …
desh2608 Mar 10, 2019
1209c07
[src] Fix && and || uses where & and | intended, and other weird erro…
kkm000 Mar 11, 2019
5a5696f
[build] Some fixes to Makefiles (#3088)
kkm000 Mar 11, 2019
abd4869
[src] Fixed -Wreordered warnings in feat (#3090)
pzelasko Mar 12, 2019
9c8ba0f
[egs] Replace bc with perl -e (#3093)
entn-at Mar 12, 2019
8cbd582
[scripts] Fix python3 compatibility issue in data-perturbing script (…
nikhilm16 Mar 12, 2019
7435661
[doc] fix some typos in doc. (#3097)
csukuangfj Mar 12, 2019
5bdea69
[build] Make sure expf() speed probe times sensibly (#3089)
kkm000 Mar 12, 2019
b7a4fec
[scripts] Make sure merge_targets.py works in python3 (#3094)
XIAOYixuan Mar 12, 2019
94475d6
[src] ifdef to fix compilation failure on CUDA 8 and earlier (#3103)
desh2608 Mar 13, 2019
fc8c17b
[doc] fix typos and broken links in doc. (#3102)
csukuangfj Mar 13, 2019
3f8b6b2
[scripts] Fix frame_shift bug in egs/swbd/s5c/local/score_sclite_conf…
freewym Mar 13, 2019
633e61c
[src] Fix wrong assertion failure in nnet3-am-compute (#3106)
MartinKocour Mar 14, 2019
8cafd32
[src] Cosmetic changes to natural-gradient code (#3108)
danpovey Mar 14, 2019
b1b230c
[src,scripts] Python2 compatibility fixes and code cleanup for nnet1 …
KarelVesely84 Mar 14, 2019
9c875ef
[doc] Small documentation fixes; update on Kaldi history (#3031)
KarelVesely84 Mar 14, 2019
7a1908f
[src] Various mostly-cosmetic changes (copying from another branch) (…
danpovey Mar 15, 2019
fcd70d3
[scripts] Simplify text encoding in RNNLM scripts (now only support …
saikiranvalluri Mar 16, 2019
b4c7ab6
[egs] Add "formosa_speech" recipe (Taiwanese Mandarin ASR) (#2474)
yfliao Mar 16, 2019
461b50c
[egs] python3 compatibility in csj example script (#3123)
rickychanhoyin Mar 16, 2019
61637e6
[egs] python3 compatibility in example scripts (#3126)
danpovey Mar 17, 2019
1f068cd
[scripts] Bug-fix for removing deleted words (#3116)
psmit Mar 17, 2019
8d60ee3
[scripts] Add fix regarding num-jobs for segment_long_utterances*.sh(…
vimalmanohar Mar 17, 2019
7fb716a
[src] Enable allow_{upsample,downsample} with online features (#3139)
jtrmal Mar 18, 2019
80c1437
[src] Fix bad assert in fstmakecontextsyms (#3142)
Mar 19, 2019
0d6ead5
[src] Fix to "Fixes to grammar-fst & LM-disambig symbols" (#3000) (#3…
daanzu Mar 19, 2019
338b586
[build] Make sure PaUtils exported from portaudio (#3144)
jtrmal Mar 19, 2019
73720e6
[src] cudamatrix: fixing a synchronization bug in 'normalize-per-row'…
KarelVesely84 Mar 20, 2019
f9276a5
[src] Fix typo in comment (#3147)
csukuangfj Mar 20, 2019
252690f
[src] Add binary that functions as a TCP server (#2938)
danijel3 Mar 20, 2019
6134c29
[scripts] Fix bug in comment (#3152)
Shujian2015 Mar 21, 2019
ae3fe28
VB cleaning based diarization
saikiranvalluri Mar 21, 2019
aead118
[scripts] Fix bug in steps/segmentation/ali_to_targets.sh (#3155)
saikiranvalluri Mar 21, 2019
213ae52
[scripts] Avoid holding out more data than the requested num-utts (du…
kkm000 Mar 21, 2019
1ac8c92
[src,scripts] Add support for two-pass agglomerative clustering. (#3058)
dogancan Mar 24, 2019
6bd9dad
[src] Disable unget warning in PeekToken (and other small fix) (#3163)
kkm000 Mar 24, 2019
37f4f44
[build] Add new nvidia tools to windows build (#3159)
btiplitz Mar 24, 2019
77ac79f
[doc] Fix documentation errors and add more docs for tcp-server decod…
danijel3 Mar 24, 2019
27034a2
[scripts] Fix non-randomness in getting utt2uniq, introduced in #3142…
desh2608 Mar 27, 2019
f9828e9
[build] Don't build for Tegra sm_XX versions on x86/ppc and vice vers…
luitjens Mar 27, 2019
419e35c
[egs] Fixes Re encoding to IAM, uw3 recipes (#3012)
aarora8 Mar 29, 2019
2ebe976
[src] Efficiency improvement and extra checking for cudamarix, RE def…
luitjens Mar 30, 2019
abf7a8c
[egs] Fix small typo in tedlium download script (#3178)
Shujian2015 Mar 30, 2019
7691d00
[github] Add GitHub issue templates (#3187)
Mar 31, 2019
9ef700f
[build] Add missing dependency to Makefile (#3191)
danpovey Mar 31, 2019
5845334
[src] Fix bug in pruned lattice rescoring when input lattice has epsi…
hainan-xv Apr 1, 2019
be019cd
[scripts] Fix bug in extend_lang.sh regarding extra_disambig.txt (#3195)
armusc Apr 2, 2019
ab35a47
Fixes in callhome data prep and mem options
saikiranvalluri Apr 3, 2019
8610219
Appended VB resegmentation to v2 recipe
saikiranvalluri Apr 3, 2019
b600a3d
Remove VB_clean folder
saikiranvalluri Apr 3, 2019
4db0840
Small bug fix in python version
saikiranvalluri Apr 3, 2019
ffbe16b
[egs] Update Tedlium s5_r3 example with more up-to-date chain TDNN co…
jyhnnhyj Apr 3, 2019
b180707
[scripts] Fix bug in extend_lang.sh causing validation failure w/ ext…
jty016 Apr 3, 2019
9f6df79
Data preparation fixes for VB resegmentation
saikiranvalluri Apr 4, 2019
7093dfa
[scripts] Bug-fix in make_lexicon_fst.py, which failed when --sil-pro…
armusc Apr 4, 2019
6f0a3a2
[egs] Fix very small typo in run_tdnn_1b.sh (#3207)
Shujian2015 Apr 4, 2019
ddeac98
[build] Tensorflow version update (#3204)
langep Apr 4, 2019
beb0151
[src] Optimizations to CUDA kernels (#3209)
kangshiyin Apr 6, 2019
a3a190b
[src] Move curand handle out of CuRand class and into CuDevice. (#3196)
luitjens Apr 7, 2019
faa7ff8
[build] Make MKL the default BLAS library, add installation scripts (…
Apr 7, 2019
76bdf20
[build] check for i686 as a valid prefix for Android triplets (#3213)
Dr-Desty-Nova Apr 7, 2019
fe650c9
Added sre10 data prep script
saikiranvalluri Apr 8, 2019
4544497
Fix in make_callhome.sh
saikiranvalluri Apr 9, 2019
4ae4bb0
[build] Fix configure breakage from #3194 (MKL default)
Apr 9, 2019
b96cab7
[build] Add missing line continuation '\' in tfrnnlmbin/Makefile (#3218)
teinhonglo Apr 10, 2019
9b730e0
[src] Fix nnet2 DctComponent test failure (#3225)
huangruizhe Apr 12, 2019
4cfbd21
[src] Update CUDA code to avoid synchronization errors on compute ca…
kangshiyin Apr 12, 2019
df41d4c
[src] fix nnet2 DCTCompnent test failure -- removing anther dct_keep_…
huangruizhe Apr 12, 2019
ebfa3cb
[build] Remove references to deprecated MKL libs in gst_plugin (#3229)
Apr 14, 2019
4e8164c
[scripts] Fix default params in nnet3 segmentation script (#3230)
rezame Apr 14, 2019
0bfc307
[src] Correct sanity check in nnet-example-utils.cc (nnet3) (#3232)
KarelVesely84 Apr 16, 2019
f8021d7
Revert "[src] Update CUDA code to avoid synchronization errors on co…
danpovey Apr 16, 2019
06a21b1
[build] .gitignore autogenerated /tools/python/ (#3241)
mcalahan Apr 17, 2019
a2d0270
[scripts] Enhance argument checks in nnet3/align_lats.sh (#3243)
Apr 18, 2019
299b111
[egs] invoke 'python2.7' not 'python' when using mmseg (#3244)
naxingyu Apr 18, 2019
4ff77c5
[scripts] Make getting nnet3 model context more robust (#3247)
KarelVesely84 Apr 18, 2019
b3a6e17
[egs] Fix hkust_data_prep.sh w.r.t. iconv mac compatibility issue (#3…
zh794390558 Apr 19, 2019
84ecd0e
[egs] Update RM chain recipe with more recent configuration (#3237)
indra622 Apr 19, 2019
c3260f2
[egs] Make voxceleb recipe work with latest version of the dataset (…
sunshines14 Apr 19, 2019
5753806
Code fixes for final stage VB smoothing
saikiranvalluri Apr 20, 2019
1743a21
Added ivector-extractor-copy in ivectorbin
saikiranvalluri Apr 20, 2019
f107cdb
[egs] Improve chain example script for Resource Management (RM) (#3252)
indra622 Apr 21, 2019
2c25629
[src] GPU-related changes for speed and correctness on newer arch's. …
luitjens Apr 22, 2019
5a34a0a
[egs] Update voxceleb v1 preparation scripts (#3255)
jyhnnhyj Apr 23, 2019
96e7b0a
[build] Note default=MKL; cosmetic fix (#3257)
nshmyrev Apr 23, 2019
d47e36c
[egs] Fix to hkust_data_prep.sh w.r.t. how mmseg is checked for (#3240)
zh794390558 Apr 23, 2019
286e8af
[egs] In WSJ run_ivector_common.sh, expose i-vector #jobs config to r…
KarelVesely84 Apr 23, 2019
e3a9844
[egs] Add Spanish dimex100 example (#3254)
alx741 Apr 23, 2019
0cc941f
[build] Build and configure OpenBLAS; default to it on non-x64 machin…
Apr 25, 2019
9e9ae13
[scripts] Fix of a bug in segmentation.pl (#3256)
songyf Apr 25, 2019
f8cb5cc
[src] Fixes to cuda unit tests. (#3268)
luitjens Apr 25, 2019
b8a35fd
[src] Adding GPU/CUDA lattice batched decoder + binary (#3114)
hugovbraun Apr 26, 2019
fb7eb4e
VB_diarization code mirrored in ttps://github.com/GoVivaceInc/
saikiranvalluri Apr 27, 2019
da4e2b8
[src] Fix unit-test failure UnitTestCuMatrixSetRandn() (#3274)
DongjiGao Apr 27, 2019
203ce37
[src,build] Removed cusolver for now (not needed yet; caused build p…
huangruizhe Apr 27, 2019
3c612fd
David's changes phase 1
saikiranvalluri Apr 29, 2019
1da8ebd
[scripts] Make fix_data_dir.sh remove utterances which have bad durat…
hhadian Apr 30, 2019
939faf8
[scripts] Make generate_plots.py python3-compatible (#3280)
May 1, 2019
212474e
[scripts] Add --one-based option to split_scp.pl (#3279)
xsawyerx May 1, 2019
a045314
[scripts] Allow UTF utterance-ids by removing unnecessary assert (#3283)
rezame May 1, 2019
b1569db
[src] Keep nnet output in the [-30,30] range required by chain denomi…
danpovey May 2, 2019
b17fc84
[scripts] Clean up filehandle usage in split_scp.pl (#3285)
xsawyerx May 2, 2019
230992f
[src] Fix to bug in online-feature.cc that caused crash at end of utt…
danpovey May 2, 2019
f7117db
[scripts] Use correct compile-time regex syntax in split_scp.pl (#3287)
xsawyerx May 2, 2019
df1ebbc
[scripts] Fix a typo in steps/dict/learn_lexicon_bayesian.sh (#3288)
xiaohui-zhang May 2, 2019
155c658
[egs,scripts] Scripts and an example of BPE-based sub-word decoding (…
DongjiGao May 5, 2019
f2670c3
[scripts] Add trainer option --trainer.optimization.num-jobs-step (#3…
May 7, 2019
9702cbc
[egs] Add MGB-5 recipe; https://arabicspeech.org/mgb5 (#3299)
May 8, 2019
5ae3c19
Revert "[scripts] Clean up filehandle usage in split_scp.pl (#3285)" …
danpovey May 9, 2019
20fb648
[src] Fix bug in GeneralMatrix::Uncompress() (#3304)
bringtree May 9, 2019
a5695e9
[src] nnet1: lstm training, introducing cursors when slicing the trai…
KarelVesely84 May 9, 2019
9424f7a
[doc] add an omission in Doxyfile (#3309)
May 10, 2019
ba165c8
[scripts] Fix utils/split_scp.pl breakage (#3308)
May 10, 2019
4d7fe3b
[egs] Bug-fix to shebang in fisher_callhome_spanish (#3312)
saikiranvalluri May 11, 2019
19c88ac
[scripts] Fix error messages in run.pl (#3314)
May 11, 2019
e922333
[egs] New chime-5 recipe (#2893)
vimalmanohar May 12, 2019
a861e56
[scripts,egs] Made changes to the augmentation script to make it work…
phanisankar-nidadavolu May 13, 2019
cec8958
[egs] updated local/musan.sh to steps/data/make_musan.sh in speaker i…
phanisankar-nidadavolu May 13, 2019
d40222e
[src] Fix sample rounding errors in extract-segments (#3321)
May 14, 2019
35f96db
[src,scripts]Store frame_shift, utt2{dur,num_frames}, .conf with feat…
May 14, 2019
a2e7ba3
[build] Initial version of Docker images for (CPU and GPU versions) (…
mdoulaty May 15, 2019
91609c7
[scripts] fix typo/bug in make_musan.py (#3327)
wonkyuml May 15, 2019
95e81c0
[scripts] Fixed misnamed variable in data/make_musan.py (#3324)
phanisankar-nidadavolu May 15, 2019
c5aa3a9
[scripts] Trust frame_shift and utt2num_frames if found (#3313)
May 16, 2019
0ff318b
[scripts] typo fix in augmentation script (#3329)
wonkyuml May 16, 2019
62ebb44
[scripts] handle frame_shit and utt2num_frames in utils/ (#3323)
May 16, 2019
c8b93bc
[scripts] Extend combine_ali_dirs.sh to combine alignment lattices (#…
May 17, 2019
528e072
[src] Fix rare case when segment end rounding overshoots file end in …
alumae May 17, 2019
8397e05
[scripts] Change --modify-spk-id default to False; back-compatibility…
phanisankar-nidadavolu May 20, 2019
8b54ef8
[build] Add easier configure option in failure message of configure (…
danpovey May 20, 2019
ce8798b
[scripts,minor] Fix typo in comment (#3338)
Shujian2015 May 22, 2019
9e0a7f6
[src,egs] Add option for applying SVD on trained models (#3272)
saikiranvalluri May 23, 2019
0e5e07b
[src] Add interfaces to nnet-batch-compute that expects device input.…
luitjens May 23, 2019
52e7ecf
[build] Update GCC support check for CUDA toolkit 10.1 (#3345)
entn-at May 27, 2019
29f3c14
[egs] Fix to aishell1 v1 download script (#3344)
naxingyu May 27, 2019
a5dd6bd
[scripts] Support utf-8 files in some scripts (#3346)
vimalmanohar May 28, 2019
8c6cd31
[src] Fix potential underflow bug in MFCC, RE energy floor, thx: Zolt…
huangruizhe May 28, 2019
e643c73
[scripts]: add warning to nnet3/chain/train.py about ineffective opti…
bringtree May 28, 2019
8706f06
[scripts] Fix regarding UTF handling in cleanup script (#3352)
vimalmanohar May 29, 2019
800924d
[scripts] Change encoding to utf-8 in data augmentation scripts (#3360)
hhadian Jun 1, 2019
eedd9fa
[src] Add CUDA accelerated MFCC computation. (#3348)
luitjens Jun 3, 2019
0b443bd
[src] Optimizations for batch nnet3. The issue fixed here is that (#…
luitjens Jun 3, 2019
16097b4
[scripts,minor] Remove outdated comment (#3361)
Shujian2015 Jun 3, 2019
ced53e1
[egs] A kaldi recipe based on the corpus named "aidatatang_200zh". (#…
DatatangAI Jun 4, 2019
f8a4376
[src] nnet1: changing end-rule in 'nnet-train-multistream', (#3358)
KarelVesely84 Jun 4, 2019
9c734a5
[scripts] Fix how the empty (faulty?) segments are handled in data-cl…
jtrmal Jun 4, 2019
b276d70
[src] Fix to bug in ivector extraction causing assert failure, thx: s…
danpovey Jun 4, 2019
de4a3e3
[src] Fix to bug in ivector extraction causing assert failure, thx: s…
danpovey Jun 4, 2019
1a4aa52
[scripts] add script to compute dev PPL on kaldi-rnnlm (#3340)
hainan-xv Jun 4, 2019
1735003
[scripts,egs] Small fixes to diarization scripts (#3366)
HuangZiliAndy Jun 4, 2019
338cc58
[egs] Modify split_scp.pl usage to match its updated code (#3371)
danpovey Jun 5, 2019
254d636
[src] Fix non-cuda `make depend` build by putting compile guards arou…
luitjens Jun 6, 2019
3648df5
[build] Docker docs update and minor changes to the Docker files (#3…
mdoulaty Jun 6, 2019
0071003
[egs] Scripts for MATERIAL ASR (#2165)
mahsa7823 Jun 6, 2019
acff3f6
[src] Batch nnet3 optimizations. Batch some of the copies in and cop…
luitjens Jun 6, 2019
23ba982
[build] Widen cuda guard in cudafeat makefile. (#3379)
langep Jun 7, 2019
04cf43b
[scripts] nnet1: updating the scripts to support 'online-cmvn', (#3383)
KarelVesely84 Jun 10, 2019
c10e02f
[build,src] Enhancements to the cudamatrix/cudavector classes. (#3373)
luitjens Jun 11, 2019
b0a6e76
[egs] Fix perl `use encoding` deprecation (#3386)
naxingyu Jun 11, 2019
63c54e2
[scripts] Add max_active to align_fmllr_lats.sh to prevent rare crash…
hhadian Jun 11, 2019
7c7a176
[src] Implemented CUDA acclerated online cmvn. (#3370)
luitjens Jun 11, 2019
c7876a3
[egs] Fixed file path RE augmentation, in aspire recipe (#3388)
Shujian2015 Jun 12, 2019
0552e22
[scripts] Update taint_ctm_edits.py, RE utf-8 encoding (#3392)
vimalmanohar Jun 13, 2019
63b3849
[src] Change nnet3-am-copy to allow more manipulations (#3393)
danpovey Jun 14, 2019
c216385
[egs] Remove confusing setting of overridden num_epochs variable in a…
Shujian2015 Jun 15, 2019
bd1da14
[build] Add a missing dependency for "decoder" in Makefile (#3397)
hhadian Jun 17, 2019
674410e
[src] CUDA decoder performance patch (#3391)
hugovbraun Jun 19, 2019
10f2fcb
[build,scripts] Dependency fix; add cross-references to scripts (#3400)
danpovey Jun 19, 2019
76557e9
[egs] Fix cleanup-after-partial-download bug in aishell (#3404)
naxingyu Jun 20, 2019
5abb1a0
[src] Change functions like AppiyLog() to all work out-of-place (#3185)
YiwenShaoStephen Jun 20, 2019
777f8c1
[src] Make stack trace display more user friendly (#3406)
rosun82 Jun 21, 2019
d5a1451
[egs] Fix to separators in Aspire reverb recipe (#3408)
danpovey Jun 22, 2019
09697c3
[egs] Fix to separators in Aspire, related to #3408 (#3409)
Shujian2015 Jun 22, 2019
fe541d2
[src] online2-tcp, add option to display start/end times (#3399)
KarelVesely84 Jun 23, 2019
c5c09e9
[src] Remove debugging assert in cuda feature extraction code (#3411)
luitjens Jun 24, 2019
837839a
[scripts] Fix to checks in adjust_unk_graph.sh (#3410)
hhadian Jun 24, 2019
563b258
[src] Added GPU feature extraction (will improve speed of GPU decodin…
luitjens Jun 24, 2019
ec13b71
[src] Fix build error introducted by race condition in PR requests/ac…
luitjens Jun 24, 2019
5b4d2c9
[src] Added error string to CUDA allocation errors. (#3413)
luitjens Jun 25, 2019
00963e2
[src] Fix CUDA_VSERION number in preprocessor checks (#3414)
LvHang Jun 25, 2019
14cc156
[src] Fix build of online feature extraction with older CUDA version …
luitjens Jun 26, 2019
5cc7ce0
[src] Update Insert function of hashlist and decoders (#3402)
LvHang Jun 26, 2019
524db19
[src] Fix spelling mistake in #3415 (#3416)
LvHang Jun 26, 2019
36a7e99
[build] Fix configure bug RE CuSolver (#3417)
LvHang Jun 26, 2019
533469c
[src] Enable an option to use the GPU for feature extraction in GPU d…
luitjens Jun 26, 2019
42315f3
[egs] Replace $cuda_cmd with $train_cmd for FarsDat (#3426)
rezame Jun 27, 2019
31df26c
[src] Remove outdated comment (#3148) (#3422)
cloudhan Jun 27, 2019
2e02eb7
[src] Adding missing thread.join in CUDA decoder and fixing two todos…
hugovbraun Jun 27, 2019
f5a5f84
[build] Add missing lib dependency in cudafeatbin (#3427)
danpovey Jun 27, 2019
fa2e8c3
[egs] Small fix to aspire run_tdnn_7b.sh (#3429)
Shujian2015 Jun 28, 2019
21c0d9b
[build] Fix to cuda makefiles, thanks: yiyidhuang@gmail.com (#3431)
danpovey Jun 28, 2019
f5d34d7
[build] Add missing deps to cuda makefiles, thanks: yiyidhuang@gmail.…
danpovey Jun 28, 2019
8c0277e
[egs] Fix encoding issues in Chinese ASR recipe (#3430) (#3434)
boystray Jun 29, 2019
b7845dd
Revert "[src] Update Insert function of hashlist and decoders (#3402)…
danpovey Jun 29, 2019
0dcc2c9
[src] Update Insert function of hashlist and decoders (#3402) (#3438)
LvHang Jun 29, 2019
c449031
[build] Fix the cross-compiling issue for Android under MacOS (#3435)
rayworks Jun 30, 2019
9a38007
[src] Marking operator as __host__ __device__ to avoid build issues (…
luitjens Jul 2, 2019
ab4eca0
[egs] Fix perl encoding bug (was causing crashes) (#3442)
naxingyu Jul 2, 2019
893181f
[src] Cuda decoder fixes, efficiency improvements (#3443)
hugovbraun Jul 3, 2019
a4b6388
[scripts] Fix shebang of taint_ctm_edits.py to invoke python3 directl…
naxingyu Jul 3, 2019
e15f689
[src] Fix to a check in nnet-compute code (#3447)
danpovey Jul 3, 2019
f53556e
[src,scripts] Various typo fixes and stylistic fixes (#3153)
csukuangfj Jul 4, 2019
6ebcb76
Merge branch 'master' into VariationalBayes_SpeakerDiarization
saikiranvalluri Jul 4, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
308 changes: 308 additions & 0 deletions egs/callhome_diarization/v1/local/VB_resegmentation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,308 @@
#!/usr/bin/env python

import numpy as np
import VB_diarization
saikiranvalluri marked this conversation as resolved.
Show resolved Hide resolved
import pickle
import kaldi_io
import sys
import argparse
import commands

def get_utt_list(utt2spk_filename):
utt_list = []
with open(utt2spk_filename, 'r') as fh:
content = fh.readlines()
for line in content:
line = line.strip('\n')
line_split = line.split()
utt_list.append(line_split[0])
print("{} UTTERANCES IN TOTAL".format(len(utt_list)))
return utt_list

def utt_num_frames_mapping(utt2num_frames_filename):
utt2num_frames = {}
with open(utt2num_frames_filename, 'r') as fh:
content = fh.readlines()
for line in content:
line = line.strip('\n')
line_split = line.split()
utt2num_frames[line_split[0]] = int(line_split[1])
return utt2num_frames

def create_ref_file(uttname, utt2num_frames, full_rttm_filename, temp_dir, rttm_filename):
utt_rttm_file = open("{}/{}".format(temp_dir, rttm_filename), 'w')

num_frames = utt2num_frames[uttname]

# We use 0 to denote silence frames and 1 to denote overlapping frames.
ref = np.zeros(num_frames)
speaker_dict = {}
num_spk = 0

with open(full_rttm_filename, 'r') as fh:
content = fh.readlines()
for line in content:
line = line.strip('\n')
line_split = line.split()
uttname_line = line_split[1]
if uttname != uttname_line:
continue
else:
utt_rttm_file.write(line + "\n")
start_time = int(float(line_split[3]) * 100)
duration_time = int(float(line_split[4]) * 100)
end_time = start_time + duration_time
spkname = line_split[7]
if spkname not in speaker_dict.keys():
spk_idx = num_spk + 2
speaker_dict[spkname] = spk_idx
num_spk += 1

for i in range(start_time, end_time):
if i < 0:
raise ValueError(line)
elif i >= num_frames:
print("{} EXCEED NUM_FRAMES".format(line))
break
else:
if ref[i] == 0:
ref[i] = speaker_dict[spkname]
else:
ref[i] = 1 # The overlapping speech is marked as 1.
ref = ref.astype(int)

print("{} SPEAKERS IN {}".format(num_spk, uttname))
print("{} TOTAL, {} SILENCE({:.0f}%), {} OVERLAPPING({:.0f}%)".format(len(ref), np.sum(ref == 0), 100.0 * np.sum(ref == 0) / len(ref), np.sum(ref == 1), 100.0 * np.sum(ref == 1) / len(ref)))

duration_list = []
for i in range(num_spk):
duration_list.append(1.0 * np.sum(ref == (i + 2)) / len(ref))
duration_list.sort()
duration_list = map(lambda x: '{0:.2f}'.format(x), duration_list)
print("DISTRIBUTION OF SPEAKER {}".format(" ".join(duration_list)))
print("")
sys.stdout.flush()
utt_rttm_file.close()
return ref

def create_rttm_output(uttname, predicted_label, output_dir, channel):
num_frames = len(predicted_label)

start_idx = 0
idx_list = []

last_label = predicted_label[0]
for i in range(num_frames):
if predicted_label[i] == last_label: # The speaker label remains the same.
continue
else: # The speaker label is different.
if last_label != 0: # Ignore the silence.
idx_list.append([start_idx, i, last_label])
start_idx = i
last_label = predicted_label[i]
if last_label != 0:
idx_list.append([start_idx, num_frames, last_label])

with open("{}/{}_predict.rttm".format(output_dir, uttname), 'w') as fh:
for i in range(len(idx_list)):
start_frame = (idx_list[i])[0]
end_frame = (idx_list[i])[1]
label = (idx_list[i])[2]
duration = end_frame - start_frame
fh.write("SPEAKER {} {} {:.2f} {:.2f} <NA> <NA> {} <NA> <NA>\n".format(uttname, channel, start_frame / 100.0, duration / 100.0, label))
return 0

def match_DER(string):
string_split = string.split('\n')
for line in string_split:
if "OVERALL SPEAKER DIARIZATION ERROR" in line:
return line
return 0

def main():
parser = argparse.ArgumentParser(description='VB Resegmentation')
parser.add_argument('data_dir', type=str, help='Subset data directory')
parser.add_argument('init_rttm_filename', type=str,
help='The rttm file to initialize the VB system, usually the AHC cluster result')
parser.add_argument('output_dir', type=str, help='Output directory')
parser.add_argument('dubm_model', type=str, help='Path of the diagonal UBM model')
parser.add_argument('ie_model', type=str, help='Path of the ivector extractor model')

parser.add_argument('--true-rttm-filename', type=str, default="None",
help='The true rttm label file')
parser.add_argument('--max-speakers', type=int, default=10,
help='Maximum number of speakers expected in the utterance (default: 10)')
parser.add_argument('--max-iters', type=int, default=10,
help='Maximum number of algorithm iterations (default: 10)')
parser.add_argument('--downsample', type=int, default=25,
help='Perform diarization on input downsampled by this factor (default: 25)')
parser.add_argument('--alphaQInit', type=float, default=100.0,
help='Dirichlet concentraion parameter for initializing q')
parser.add_argument('--sparsityThr', type=float, default=0.001,
help='Set occupations smaller that this threshold to 0.0 (saves memory as \
the posteriors are represented by sparse matrix)')
parser.add_argument('--epsilon', type=float, default=1e-6,
help='Stop iterating, if obj. fun. improvement is less than epsilon')
parser.add_argument('--minDur', type=int, default=1,
help='Minimum number of frames between speaker turns imposed by linear \
chains of HMM states corresponding to each speaker. All the states \
in a chain share the same output distribution')
parser.add_argument('--loopProb', type=float, default=0.9,
help='Probability of not switching speakers between frames')
parser.add_argument('--statScale', type=float, default=0.2,
help='Scale sufficient statiscits collected using UBM')
parser.add_argument('--llScale', type=float, default=1.0,
help='Scale UBM likelihood (i.e. llScale < 1.0 make atribution of \
frames to UBM componets more uncertain)')
parser.add_argument('--channel', type=int, default=0,
help='Channel information in the rttm file')
parser.add_argument('--initialize', type=int, default=1,
help='Whether to initalize the speaker posterior')

args = parser.parse_args()
print(args)
data_dir = args.data_dir
init_rttm_filename = args.init_rttm_filename

# The data directory should contain wav.scp, spk2utt, utt2spk and feats.scp
utt2spk_filename = "{}/utt2spk".format(data_dir)
utt2num_frames_filename = "{}/utt2num_frames".format(data_dir)
feats_scp_filename = "{}/feats.scp".format(data_dir)
temp_dir = "{}/tmp".format(args.output_dir)
rttm_dir = "{}/rttm".format(args.output_dir)

utt_list = get_utt_list(utt2spk_filename)
utt2num_frames = utt_num_frames_mapping(utt2num_frames_filename)
print("------------------------------------------------------------------------")
print("")
sys.stdout.flush()

# Load the diagonal UBM and i-vector extractor
with open(args.dubm_model, 'rb') as fh:
dubm_para = pickle.load(fh)
with open(args.ie_model, 'rb') as fh:
ie_para = pickle.load(fh)

DUBM_WEIGHTS = None
DUBM_MEANS_INVVARS = None
DUBM_INV_VARS = None
IE_M = None

for key in dubm_para.keys():
if key == "<WEIGHTS>":
DUBM_WEIGHTS = dubm_para[key]
elif key == "<MEANS_INVVARS>":
DUBM_MEANS_INVVARS = dubm_para[key]
elif key == "<INV_VARS>":
DUBM_INV_VARS = dubm_para[key]
else:
continue

for key in ie_para.keys():
if key == "M":
IE_M = np.transpose(ie_para[key], (2, 0, 1))
m = DUBM_MEANS_INVVARS / DUBM_INV_VARS
iE = DUBM_INV_VARS
w = DUBM_WEIGHTS
V = IE_M

# Load the MFCC features
feats_dict = {}
for key,mat in kaldi_io.read_mat_scp(feats_scp_filename):
feats_dict[key] = mat

for utt in utt_list:
# Get the alignments from the clustering result.
# In init_ref, 0 denotes the silence silence frames
# 1 denotes the overlapping speech frames, the speaker
# label starts from 2.
init_ref = create_ref_file(utt, utt2num_frames, init_rttm_filename, temp_dir, "{}.rttm".format(utt))
# Ground truth of the diarization.
if args.true_rttm_filename != "None":
true_ref = create_ref_file(utt, utt2num_frames, args.true_rttm_filename, temp_dir, "{}_true.rttm".format(utt))
else:
true_ref = None

X = feats_dict[utt]
X = X.astype(np.float64)

# Keep only the voiced frames (0 denotes the silence
# frames, 1 denotes the overlapping speech frames). Since
# our method predicts single speaker label for each frame
# the init_ref doesn't contain 1.
mask = (init_ref >= 2)
X_voiced = X[mask]
init_ref_voiced = init_ref[mask] - 2

if args.true_rttm_filename != "None":
true_ref_voiced = true_ref[mask] - 2
if np.sum(true_ref) == 0:
print("Warning: {} has no voiced frames in the label file".format(utt))
continue
if X_voiced.shape[0] == 0:
print("Warning: {} has no voiced frames in the initialization file".format(utt))
continue

# Initialize the posterior of each speaker based on the clustering result.
if args.initialize:
q = VB_diarization.frame_labels2posterior_mx(init_ref_voiced, args.max_speakers)
if args.true_rttm_filename != "None":
cmd = "md-eval.pl -1 -c 0.25 -r {}/{}_true.rttm -s {}/{}.rttm 2".format(temp_dir, utt, temp_dir, utt)
status, output = commands.getstatusoutput(cmd)
assert status == 0
DER_info = match_DER(output)
print("BEFORE RUNNING VB RESEGMENTATION")
print(DER_info + "\n")
else:
q = None
print("RANDOM INITIALIZATION\n")

# VB resegmentation

# q - S x T matrix of posteriors attribution each frame to one of S possible
# speakers, where S is given by opts.maxSpeakers
# sp - S dimensional column vector of ML learned speaker priors. Ideally, these
# should allow to estimate # of speaker in the utterance as the
# probabilities of the redundant speaker should converge to zero.
# Li - values of auxiliary function (and DER and frame cross-entropy between q
# and reference if 'ref' is provided) over iterations.
q_out, sp_out, L_out = VB_diarization.VB_diarization(X_voiced, m, iE, w, V, sp=None, q=q, maxSpeakers=args.max_speakers, maxIters=args.max_iters, VtiEV=None,
downsample=args.downsample, alphaQInit=args.alphaQInit, sparsityThr=args.sparsityThr, epsilon=args.epsilon, minDur=args.minDur,
loopProb=args.loopProb, statScale=args.statScale, llScale=args.llScale, ref=None, plot=False)

predicted_label_voiced = np.argmax(q_out, 1) + 2
predicted_label = (np.zeros(len(mask))).astype(int)
predicted_label[mask] = predicted_label_voiced

duration_list = []
for i in range(args.max_speakers):
num_frames = np.sum(predicted_label == (i + 2))
if num_frames == 0:
continue
else:
duration_list.append(1.0 * num_frames / len(predicted_label))
duration_list.sort()
duration_list = map(lambda x: '{0:.2f}'.format(x), duration_list)
print("PREDICTED {} SPEAKERS".format(len(duration_list)))
print("DISTRIBUTION {}".format(" ".join(duration_list)))
print("sp_out", sp_out)
print("L_out", L_out)

# Create the output rttm file and compute the DER after re-segmentation
create_rttm_output(utt, predicted_label, rttm_dir, args.channel)
if args.true_rttm_filename != "None":
cmd = "md-eval.pl -1 -c 0.25 -r {}/{}_true.rttm -s {}/{}_predict.rttm 2".format(temp_dir, utt, rttm_dir, utt)
status, output = commands.getstatusoutput(cmd)
assert status == 0
DER_info = match_DER(output)
print("")
print("AFTER RUNNING VB RESEGMENTATION")
print(DER_info)
print("")
print("------------------------------------------------------------------------")
print("")
sys.stdout.flush()
return 0

if __name__ == "__main__":
main()
Loading