-
-
Notifications
You must be signed in to change notification settings - Fork 828
Labs feature: Early implementation of voice messages #5769
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a quick look so far but I didn't realise this would involve bundling a wasm-ed opus encoder :( How big does it end up being? I wonder if we should be backing the webm horse instead of ogg, especially if it allows us to avoid shipping a wasm-ed libopus (which I think it does).
src/voice/VoiceRecorder.ts
Outdated
mediaTrackConstraints: <MediaTrackConstraints>{ | ||
deviceId: CallMediaHandler.getAudioInput(), | ||
}, | ||
encoderSampleRate: 16000, // we could go down to 12khz, but we lose quality |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12kHz would be pretty weird. Honestly I would probably just go up to 48Khz: it's what browsers will use over WebRTC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I'd expect to see a mono / stereo param here maybe (unless that's implicit from the source?) And more importantly a target bitrate / quality setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
apaprently it defaults to mono, and picks a quality/bitrate for us. I'll play around with some options that seem to fit well for us and hardcode them for good measure.
For the sample rate: 16khz didn't distort my voice at all - it was indistinguishable to 48khz, and ultimately lowered file size. The file size gains aren't super large though (at 16khz it's about 34kb for 5s of audio, at 48khz it's 45kb for 5s).
src/voice/VoiceRecorder.ts
Outdated
deviceId: CallMediaHandler.getAudioInput(), | ||
}, | ||
encoderSampleRate: 16000, // we could go down to 12khz, but we lose quality | ||
encoderApplication: 2048, // voice (default is "audio") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could tweak this but iirc this is a, "you can set this but you probably don't need to" thing (I guess they are voice messages...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, it seemed to be working fine without this but hey: spec compliance or something? idk, it's an option I found while finding other options so I set it.
The encoder is only 332kb of JS/wasm, so not too bad. Olm is sitting at 150kb (legacy module at 436kb), so it doesn't seem controversially large to me. |
This leads to more reliable frequency/timing information, and involves a whole lot less decoding. We still maintain ongoing encoded frames to avoid having to do one giant encode at the end, as that could take long enough to be disruptive.
@dbkr so I ended up using the browser's API. Apologies for the unexpected complexity in this PR review :s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looking good. would be good to get a final sanity check on ogg vs webm and if ogg is really the way we want to go, since once this is merged, we're a lot closer to being stuck with it.
}); | ||
|
||
let tooltip = _t("Record a voice message"); | ||
if (!!this.state.recorder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!!this.state.recorder) { | |
if (Boolean(this.state.recorder)) { |
I think this is our nominally preferred way of doing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it? I don't recall this, and haven't done this conversion in years
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough - I think this might be from kegan days. I have no preference personally.
} | ||
|
||
public get isSupported(): boolean { | ||
return !!Recorder.isRecordingSupported(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return !!Recorder.isRecordingSupported(); | |
return Boolean(Recorder.isRecordingSupported()); |
Is it possible to use the |
not if we're using ogg as a container format 'cos chrome doesn't support it - hence wondering if webm would be the way to go |
Ah yeah, in that case would be great to use |
The MSC says it must be ogg opus https://github.com/matrix-org/matrix-doc/pull/2516/files#diff-c926d801e68078ea5aaf2ac264459e8ce5fd830b2f46b11aa853bf45a11a1069R25 |
I see... However, I don't see any discussion of why Ogg was chosen in the MSC. Perhaps we should advocate for WebM at the MSC level and try to move to that then. |
There were a number of resources saying the media recorder API is unreliable and doesn't produce a consistent format, so I steered clear of it. We chose opus after great debate for its compression and versatility, knowing that encoders would be needed in clients. The point is to test something and we can always change it later. This is a labs feature of an evolving MSC: there's no reason why opus will be set in stone with this, and we're trialing formats before the mobile team gets to this point in development. I'm not interested in rehashing formats here, basically. |
Where did this debate occur? I don't see it on the MSC at least... |
#matrix-spec - if it's not there then it's a failure of the MSC process. |
Thanks for recording the thinking on the MSC. Discussion in #matrix-spec is fine for ad-hoc thoughts, but it seems important to record the thinking behind big choices (like the container format here) somewhere in the MSC document or PR discussion (which you've now done). As you say, it's an experiment, so I won't try to block on the extra encoder... Does it already download separately from the main bundle at least? Ideally we are not forcing a large download for everyone for an experimental feature. |
As far as I can tell it's not even being requested, which is curious. The field is required (otherwise it explodes), but the request doesn't show up in the network log. Because the error is happening on creation, I'm inclined to believe that it only pulls it down when it needs it. I'm still not sure I'd qualify a few hundred kilobytes as "large" though... |
We made an effort to isolate |
ah, the actual bundle (with the not-encoder bit) will have grown a bit... Using a simple
In particular, the However, these bundle files are massive - we should look into that separately. |
also having tested the build output with a server that logs requests, I can see the encoderWorker being requested only once the user clicks the button (ie: the recorder starts). It also looks like it gets cached somewhere in the browser as refreshing with cache disabled doesn't cause it to be re-requested. |
I've added element-hq/element-web#8356 to the platform backlog to eventually take a look. They definitely are large, and I'm sure a variety of changes can help soften the impact of such a large size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since other apps seem to have converged on ogg as a container format, I think I've convinced myself that it is probably the way to go. Arguably we could be using the browser's encoder and just re-muxing, but that's probably premature optimisation, so lgtm.
This is deliberately behind a labs flag because it is not finished.
Requires element-hq/element-web#16705
For element-hq/element-web#1358
Note: There's a bunch of
@@ TravisR
TODOs in here. These are annotated so I can find them and fix them for a later PR.State of affairs screenshots:
Voice message rendering:
Unsupported (older) element webs will see this (the rendering is intentionally not behind the labs flag at the moment):
(other client behaviour undefined and not checked - we're aware of the risks)
Composer states (not final):