You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Certain higher-order characters from a user's handset to a Restcomm-connected bot get corrupted at the Restcomm level. Not all higher-order characters exhibit this problem.
Related Tickets
There are many tickets related to double-byte messages.
Every Restcomm-connected bot that can accept arbitrary natural-language input will be affected. Obviously non-US users will be more affected than US users.
There is no reliable workaround. As discussed in #2607, it's possible for the recipient to distinguish reliably between different encodings only if a BOM (U+FFEF) is present. Otherwise, only heuristics are possible and in many cases the information is simply not recoverable even if the sequence of decoding errors is known.
Isolated to Restcomm
I've changed every variable outside of Restcomm, and the behavior is identical:
The same thing happens when the message is sent from my handset (on T-Mobile) through a Restcomm instance, whether that Restcomm instance is tied to Teli (tom+rcteli@lumin.ai), or to Hook (tom+rchook@lumin.ai).
The same thing happens when the message is sent from my Google Voice line through a Restcomm instance, whether that Restcomm instance is tied to Teli (tom+rcteli@lumin.ai), or to Hook (tom+rchook@lumin.ai).
A message carrying an identical string arrives intact if sent from my handset to my Google Voice line without going through Restcomm, or vice versa.
The corruption is visible in the Restcomm logs, i.e., before reaching our platform.
Affected Characters
Here are some characters that are affected:
é (U+00E9): Latin Small Letter E with Acute
ñ (U+00F1): Latin Small Letter N with Tilde
[ (U+005B): Left Square Bracket
] (U+005D): Left Square Bracket
@ (U+0040): Commercial At
😀 (U+1F600): Grinning Face
Here are some characters that are not affected:
e (U+0065): Latin Small Letter E
n (U+006E): Latin Small Letter N
‘ (U+2018): Left Single Quotation Mark
’ (U+2019): Right Single Quotation Mark
“ (U+201C): Left Double Quotation Mark
” (U+201D): Right Double Quotation Mark
Strangely, some characters that are not affected are higher order than some that are affected.
Examples
My name is José Peña.
Restcomm via Hook: SmsSid SMa117bca5a48843ada30f545c8964134a (from T-Mobile) and SM8ab768363a8d44178fc8ff7a642d24be (from Google Voice)
Restcomm via Teli: SmsSid SMd933d6eff7a946c4adef1932db99debf (from T-Mobile) and SM02389664fedb4aa3afff6a79d28aa7d1 (from Google Voice)
The text was updated successfully, but these errors were encountered:
Summary
Certain higher-order characters from a user's handset to a Restcomm-connected bot get corrupted at the Restcomm level. Not all higher-order characters exhibit this problem.
Related Tickets
There are many tickets related to double-byte messages.
Scope of Impact
Every Restcomm-connected bot that can accept arbitrary natural-language input will be affected. Obviously non-US users will be more affected than US users.
There is no reliable workaround. As discussed in #2607, it's possible for the recipient to distinguish reliably between different encodings only if a BOM (U+FFEF) is present. Otherwise, only heuristics are possible and in many cases the information is simply not recoverable even if the sequence of decoding errors is known.
Isolated to Restcomm
I've changed every variable outside of Restcomm, and the behavior is identical:
Affected Characters
Here are some characters that are affected:
é
(U+00E9): Latin Small Letter E with Acuteñ
(U+00F1): Latin Small Letter N with Tilde[
(U+005B): Left Square Bracket]
(U+005D): Left Square Bracket@
(U+0040): Commercial At😀
(U+1F600): Grinning FaceHere are some characters that are not affected:
e
(U+0065): Latin Small Letter En
(U+006E): Latin Small Letter N‘
(U+2018): Left Single Quotation Mark’
(U+2019): Right Single Quotation Mark“
(U+201C): Left Double Quotation Mark”
(U+201D): Right Double Quotation MarkStrangely, some characters that are not affected are higher order than some that are affected.
Examples
My name is José Peña.
SMa117bca5a48843ada30f545c8964134a
(from T-Mobile) andSM8ab768363a8d44178fc8ff7a642d24be
(from Google Voice)SMd933d6eff7a946c4adef1932db99debf
(from T-Mobile) andSM02389664fedb4aa3afff6a79d28aa7d1
(from Google Voice)The text was updated successfully, but these errors were encountered: