[1]
|
C.-C.Chiu, T. N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen,
A. Kannan, R. J. Weiss, K. Rao, K. Gonina, N. Jaitly, B. Li, J. Chorowski,
and M. Bacchiani, “State-of-the-art speech recognition with
sequence-to-sequence models,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[2]
|
S. Toshniwal, T. N. Sainath, R. J. Weiss, B. Li, P. Moreno, E. Weinstein, and
K. Rao, “Multilingual speech recognition with a single end-to-end model,”
in Proc. IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2018.
[ pdf ]
|
[3]
|
B. Li, T. N. Sainath, K. Sim, M. Bacchiani, E. Weinstein, P. Nguyen, Z. Chen,
Y. Wu, and K. Rao, “Multi-Dialect Speech Recognition With a Single
Sequence-to-Sequence Model,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[4]
|
T. N. Sainath, P. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach,
V. Schogol, P. Nguyen, B. Li, Y. Wu, Z. Chen, and C. C. Chiu, “No Need for
a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End
Models,” in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[5]
|
D. Lawson, C. C. Chiu, G. Tucker, C. Raffel, K. Swersky, and N. Jaitly,
“Learning hard alignments with variational inference,” in Proc.
IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2018.
[ pdf ]
|
[6]
|
A. Kannan, Y. Wu, P. Nguyen, T. N. Sainath, Z. Chen, and R. Prabhavalkar, “An
analysis of incorporating an external language model into a
sequence-to-sequence model,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[7]
|
R. Prabhavalkar, T. N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C. C. Chiu, and
A. Kannan, “Minimum Word Error Rate Training for Attention-based
Sequence-to-sequence Models,” in Proc. IEEE International Conference
on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
[ pdf ]
|
[8]
|
T. N. Sainath, C. C. Chiu, R. Prabhavalkar, A. Kannan, Y. Wu, P. Nguyen, and
Z. C. Z, “Improving the Performance of Online Neural Transducer Models,”
in Proc. IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP), 2018.
[ pdf ]
|
[9]
|
C. C. Chiu and C. Raffel, “Monotonic Chunkwise Attention,” in Proc.
International Conference on Learning Representations (ICLR), 2018.
[ pdf ]
|
[10]
|
I. Williams, A. Kannan, P. Aleksic, D. Rybach, and T. N. S. TN, “Contextual
Speech Recognition in End-to-End Neural Network Systems using Beam Search,”
in Proc. Interspeech, 2018.
[ pdf ]
|
[11]
|
C. C. Chiu, A. Tripathi, K. Chou, C. Co, N. Jaitly, D. Jaunzeikare, A. Kannan,
P. Nguyen, H. Sak, A. Sankar, J. Tansuwan, N. Wan, Y. Wu, and X. Zhang,
“Speech recognition for medical conversations,” in Proc.
Interspeech, 2018.
[ pdf ]
|
[12]
|
R. Pang, T. N. Sainath, R. Prabhavalkar, S. Gupta, Y. Wu, S. Zhang, and C. C.
Chiu, “Compression of End-to-End Models,” in Proc. Interspeech,
2018.
[ pdf ]
|
[13]
|
S. Toshniwal, A. Kannan, C. C. Chiu, Y. Wu, T. N. Sainath, and K. Livescu, “A
comparison of techniques for language model integration in encoder-decoder
speech recognition,” in Proc. IEEE Spoken Language Technology
Workshop (SLT), 2018.
[ pdf ]
|
[14]
|
G. Pundak, T. N. Sainath, R. Prabhavalkar, A. Kannan, and D. Zhao, “Deep
context: End-to-end contextual speech recognition,” in Proc. IEEE
Spoken Language Technology Workshop (SLT), 2018.
[ pdf ]
|
[15]
|
B. Li, Y. Zhang, T. N. Sainath, Y. Wu, and W. Chan, “Bytes are all you need:
End-to-end multilingual speech recognition and synthesis with bytes,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2019.
[ pdf ]
|
[16]
|
J. Guo, T. N. Sainath, and R. J. Weiss, “A spelling correction model for
end-to-end speech recognition,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[17]
|
U. Alon, G. Pundak, and T. N. Sainath, “Contextual speech recognition with
difficult negative training examples,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[18]
|
Y. Qin, N. Carlini, I. Goodfellow, G. Cottrell, and C. Raffel, “Imperceptible,
robust, and targeted adversarial examples for automatic speech recognition,”
in Proc. International Conference on Machine Learning (ICML), 2019.
[ pdf ]
|
[19]
|
D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le,
“SpecAugment: A Simple Data Augmentation Method for Automatic Speech
Recognition,” in arXiv, 2019.
[ pdf ]
|
[20]
|
B. Li, T. N. Sainath, R. Pang, and Z. Wu, “Semi-supervised training for
end-to-end models via weak distillation,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[21]
|
S.-Y. Chang, R. Prabhavalkar, Y. He, T. N. Sainath, and G. Simko, “Joint
endpointing and decoding with end-to-end models,” in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2019.
[ pdf ]
|
[22]
|
J. Heymann, K. C. Sim, and B. Li, “Improving ctc using stimulated learning for
sequence modeling,” in Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[23]
|
A. Bruguier, R. Prabhavalkar, G. Pundak, and T. N. Sainath, “Phoebe:
Pronunciation-aware contextualization for end-to-end speech recognition,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2019.
[ pdf ]
|
[24]
|
Y. He, T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao,
D. Rybach, A. Kannan, Y. Wu, R. Pang, Q. Liang, D. Bhatia, Y. Shangguan,
B. Li, G. Pundak, K. C. Sim, T. Bagby, S.-Y. Chang, K. Rao, and
A. Gruenstein, “Streaming end-to-end speech recognition for mobile
devices,” in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), 2019.
[ pdf ]
|
[25]
|
K. Irie, R. Prabhavalkar, A. Kannan, A. Bruguier, D. Rybach, and P. Nguyen,
“On the choice of modeling unit for sequence-to-sequence speech
recognition,” in Proc. Interspeech, 2019.
[ pdf ]
|
[26]
|
C. Peyser, H. Zhang, T. N. Sainath, and Z. Wu, “Improving Performance of
End-to-End ASR on Numeric Sequences,” in Proc. Interspeech, 2019.
[ pdf ]
|
[27]
|
D. Zhao, T. N. Sainath, D. Rybach, D. Bhatia, B. Li, and R. Pang,
“Shallow-fusion end-to-end contextual biasing,” in Proc. Interspeech,
2019.
[ pdf ]
|
[28]
|
T. N. Sainath, R. Pang, D. Rybach, Y. He, R. Prabhavalkar, W. Li, M. Visontai,
Q. Liang, T. Strohman, Y. Wu, I. McGraw, and C.-C. Chiu, “Two-pass
end-to-end speech recognition,” in Proc. Interspeech, 2019.
[ pdf ]
|
[29]
|
C.-C. Chiu, W. Han, Y. Zhang, R. Pang, S. Kishchenko, P. Nguyen, A. Narayanan,
H. Liao, S. Zhang, A. Kannan, R. Prabhavalkar, Z. Chen, T. Sainath, and
Y. Wu, “A comparison of end-to-end models for long-form speech
recognition,” in Proc. IEEE Automatic Speech Recognition and
Understanding Workshop (ASRU), 2019.
[ pdf ]
|
[30]
|
A. Narayanan, R. Prabhavalkar, C. Chiu, D. Rybach, T. Sainath, and T. Strohman,
“Recognizing long-form speech using streaming end-to-end models,” in
Proc. IEEE Automatic Speech Recognition and Understanding Workshop
(ASRU), 2019.
[ pdf ]
|
[31]
|
T. N. Sainath, R. Pang, R. Weiss, Y. He, C.-C. Chiu, and T. Strohman, “An
attention-based joint acoustic and text on-device end-to-end model,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2020.
|
[32]
|
Z. Lu, L. Cao, Y. Zhang, C.-C. Chiu, and J. Fan, “Speech sentiment analysis
via pre-trained features from end-to-end asr models,” in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2020.
|
[33]
|
D. Park, Y. Zhang, C.-C. Chiu, Y. Chen, B. Li, W. Chan, Q. Le, and Y. Wu,
“Specaugment on large scale datasets,” in Proc. IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020.
[ pdf ]
|
[34]
|
T. Sainath, Y. He, B. Li, A. Narayanan, R. Pang, A. Bruguier, S. yiin Chang,
W. Li, R. Alvarez, Z. Chen, C. cheng Chiu, D. Garcia, A. Gruenstein, K. Hu,
M. Jin, A. Kannan, Q. Liang, I. McGraw, C. Peyser, R. Prabhavalkar,
G. Pundak, D. Rybach, Y. Shangguan, Y. Sheth, T. Strohman, M. Visontai,
Y. Wu, Y. Zhang, and D. Zhao, “A streaming on-device end-to-end model
surpassing server-side conventional model quality and latency,” in
Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP), 2020.
|
[35]
|
A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang,
Z. Zhang, Y. Wu, and R. Pang, “Conformer: Convolution-augmented transformer
for speech recognition,” in Proc. Interspeech, 2020.
[ pdf ]
|
[36]
|
W. Han, Z. Zhang, Y. Zhang, J. Yu, C.-C. Chiu, J. Qin, A. Gulati, R. Pang, and
Y. Wu, “Contextnet: Improving convolutional neural networks for automatic
speech recognition with global context,” in Proc. Interspeech, 2020.
[ pdf ]
|
[37]
|
W. Li, J. Qin, C.-C. Chiu, R. Pang, and Y. He, “Parallel rescoring with
transformer for streaming on-device speech recognition,” in Proc.
Interspeech, 2020.
|
[38]
|
D. S. Park, Y. Zhang, Y. Jia, W. Han, C.-C. Chiu, B. Li, Y. Wu, and Q. V. Le,
“Improved noisy student training for automatic speech recognition,” in
Proc. Interspeech, 2020.
[ pdf ]
|
[39]
|
Y. Zhang, J. Qin, D. S. Park, W. Han, C.-C. Chiu, R. Pang, Q. V. Le, and Y. Wu,
“Pushing the limits of semi-supervised learning for automatic speech
recognition,” in NeurIPS 2020 Workshop on Self-Supervised Learning for
Speech and Audio Processing, 2020.
[ pdf ]
|
[40]
|
C.-C. Chiu, A. Narayanan, W. Han, R. Prabhavalkar, Y. Zhang, N. Jaitly,
R. Pang, T. N. Sainath, P. Nguyen, L. Cao, and Y. Wu, “Rnn-t models fail to
generalize to out-of-domain audio: Causes and solutions,” in Proc.
IEEE Spoken Language Technology Workshop (SLT), 2020.
[ pdf ]
|
[41]
|
S. Panchapagesan, D. S. Park, C.-C. Chiu, Y. Shangguan, Q. Liang, and
A. Gruenstein, “Efficient knowledge distillation for rnn-transducer
models,” in Proc. IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP), 2021.
[ pdf ]
|
[42]
|
A. Narayanan, T. N. Sainath, R. Pang, J. Yu, C.-C. Chiu, R. Prabhavalkar,
E. Variani, and T. Strohman, “Cascaded encoders for unifying streaming and
non-streaming asr,” in Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), 2021.
[ pdf ]
|
[43]
|
B. Li, A. Gulati, J. Yu, T. N. Sainath, C.-C. Chiu, A. Narayanan, S.-Y. Chang,
R. Pang, Y. He, J. Qin, W. Han, Q. Liang, Y. Zhang, T. Strohman, and Y. Wu,
“A better and faster end-to-end model for streaming asr,” in Proc.
IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2021.
[ pdf ]
|
[44]
|
T. Doutre, W. Han, M. Ma, Z. Lu, C.-C. Chiu, R. Pang, A. Narayanan, A. Misra,
Y. Zhang, and L. Cao, “Improving streaming automatic speech recognition with
non-streaming model distillation on unsupervised data,” in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2021.
[ pdf ]
|
[45]
|
T. Doutre, W. Han, C.-C. Chiu, R. Pang, O. Siohan, and L. Cao, “Bridging the
gap between streaming and non-streaming ASR systems bydistilling ensembles
of CTC and RNN-T models,” in Proc. Interspeech, 2021.
[ pdf ]
|
[46]
|
J. Yu, C.-C. Chiu, B. Li, S. yiin Chang, T. N. Sainath, Y. He, A. Narayanan,
W. Han, A. Gulati, Y. Wu, and R. Pang, “Fastemit: Low-latency streaming asr
with sequence-level emission regularization,” in Proc. IEEE
International Conference on Acoustics, Speech, and Signal Processing
(ICASSP), 2021.
[ pdf ]
|
[47]
|
Z. Lu, W. Han, Y. Zhang, and L. Cao, “Exploring targeted universal adversarial
perturbations to end-to-end asr models,” in Proc. Interspeech, 2021.
[ pdf ]
|
[48]
|
Q. Li, Y. Zhang, B. Li, L. Cao, and P. C. Woodland, “Residual energy-based
models for end-to-end speech recognition,” in Proc. Interspeech, 2021.
[ pdf ]
|
[49]
|
Q. Li, D. Qiu, Y. Zhang, B. Li, Y. He, P. C. Woodland, L. Cao, and T. Strohman,
“Confidence estimation for attention-based sequence-to-sequence models for
speech recognition,” in Proc. IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP), 2021.
[ pdf ]
|