Byte length of UTF-16 encoded strings are assumed to be 2 * (number of characters) #1876

alexisbalbachan · 2025-01-23T03:20:34Z

As mentioned in #1859, there are many places in our code that encode strings into UTF-16 and then specify their length as being 2 * (number of characters)

This is NOT true, UTF-16 does encode some Unicode code-points from U+0000 to U+FFFF as 2 bytes, however anything from U+010000 to U+10FFFF will be encoded as 4 bytes.

Some noteworthy code-points in the higher range are:

CJK Unified Ideographs Extension B-I (20000-2EE5F)
Emoticons (1F600-1F64F)

We should identify every place in the code where we assume the length of UTF-16 encoded strings and just replace them with len(utf16_encoded_bytes)

Here's an example of why we shouldn't assume the length of an UTF-16 encoded string:

This is an inclomplete list of places where we're assuming the length of UTF-16 encoded strings (credits to @rtpt-romankarwacik for most of them) :

impacket/dcerpc/v5/samr.py:2838:        samUser['Buffer'] = b'A'*(512-len(newPassword)*2) + newPassword.encode('utf-16le')
impacket/dcerpc/v5/samr.py:2841:        samUser['Buffer'] = b'A'*(512-len(newPassword)*2) + newPassword.decode(sys.getfilesystemencoding()).encode('utf-16le')
impacket/dcerpc/v5/samr.py:2843:    samUser['Length'] = len(newPassword)*2
impacket/dcerpc/v5/dcom/oaut.py:276:            self['cBytes'] = len(value)*2
impacket/dcerpc/v5/dtypes.py:384:            self['Length'] = len(value)*2
impacket/dcerpc/v5/dtypes.py:385:            self['MaximumLength'] = len(value)*2
impacket/smb.py:927:        ('FileNameLength','<L-FileName','len(FileName)*2'),
impacket/smb.py:956:        ('FileNameLength','<L-FileName','len(FileName)*2'),
impacket/smb.py:987:        ('FileNameLength','<L-FileName','len(FileName)*2'),
impacket/smb.py:1015:        ('FileNameLength','<L-FileName','len(FileName)*2'),
impacket/smb.py:1030:        ('FileNameLength','<L-FileName','len(FileName)*2'),
impacket/smb.py:1053:        ('FileNameLength','<L-FileName','len(FileName)*2'),
impacket/smb.py:1077:        ('FileNameLength','<B-FileName','len(FileName)*2'),
examples/karmaSMB.py:389:        ntCreateRequest['NameLength'] = len(targetFile)*2
tests/dcerpc/test_samr.py:326:        #entry.fields['MaximumLength'] = len('Administrator\x00')*2
tests/dcerpc/test_samr.py:2288:        samUser['Buffer'] = b'A'*(512-len(newPwd)*2) + newPwd.encode('utf-16le')
tests/dcerpc/test_samr.py:2289:        samUser['Length'] = len(newPwd)*2
tests/dcerpc/test_scmr.py:127:           self.assertEqual(arrayData[offset:][:len(changeDone)*2].decode('utf-16le'), changeDone)
tests/dcerpc/test_rrp.py:216:        request['cbData'] = len(self.test_value_data)*2
tests/misc/test_structure.py:81:            ('code1', '>L=len(arr1)*2+0x1000'),  # Not sure
tests/misc/test_structure.py:191:            ('leni', '<L=len(uno)*2'),  # Not sure

examples/ntfs-read.py:239:                     ('_FileName','_-FileName','self["FileNameLen"]*2'),
examples/ntfs-read.py:321:                     self.AttributeName = data[self.AttributeHeader['NameOffset']:][:self.AttributeHeader['NameLength']*2].decode('utf-16le')
impacket/dcerpc/v5/dtypes.py:172:             return self["ActualCount"]*2    # Not sure
impacket/dcerpc/v5/rrp.py:829:                   request.fields['lpValueNameIn'].fields['MaximumLength'] = dataLen*2  # Not sure
impacket/dcerpc/v5/srvs.py:1790:               return self["ActualCount"]*2    # Not sure
Mutiple matches in tds.py

The text was updated successfully, but these errors were encountered:

anadrianmanrique added bug Unexpected problem or unintended behavior high High priority item labels Jan 23, 2025

anadrianmanrique assigned alexisbalbachan Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte length of UTF-16 encoded strings are assumed to be 2 * (number of characters) #1876

Byte length of UTF-16 encoded strings are assumed to be 2 * (number of characters) #1876

alexisbalbachan commented Jan 23, 2025

Byte length of UTF-16 encoded strings are assumed to be 2 * (number of characters) #1876

Byte length of UTF-16 encoded strings are assumed to be 2 * (number of characters) #1876

Comments

alexisbalbachan commented Jan 23, 2025