Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JA4H.md documentation contains inconsistent details and problematic delimiter #80

Open
eugeneturk opened this issue Feb 28, 2024 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@eugeneturk
Copy link

In the technical_details document describing the JA4H specification, there are minor inconsistencies and a problematic delimiter. This document was previously found at technical_details/JA4H.md and subsequently removed in this commit:

b6f3ff4#diff-aeca2ef7c4beaff2ccd0f42a618a6c85d23ba0e625fa735fda15332bf4d629c6

Issue 1, line 54: "99 = anything > than 100 headers".
This should likely be either "99 = anything > than 99 headers" or "99 = anything >= than 100 headers".

Issue 2, lines 138-180, under "## JA4H Example:".
The count of headers in the first section is 13, including Cookie and Referer, therefore the first section of the JA4H should be:
ge20cr11enus not ge20cr13enus as it is on line 179.
The hash of the sorted, delimited cookie name fields for the third section is listed as b66fa821d02c. It should be 0f2659b474bf. The incorrect value appears on lines 175 and 179.
The hash of the sorted, delimited cookie names+values fields for the fourth section is listed as e97928733c74. It should be 161698816dab.
The incorrect value appears on lines 177 and 179.

Here is a script showing the calculation of correct values using the strings copied directly from the technical document:

#!/usr/bin/perl

use 5.16.0;
use warnings;

use Digest::SHA qw(sha256_hex);

my $headers = 'Host,Sec-Ch-Ua,Sec-Ch-Ua-Mobile,User-Agent,Sec-Ch-Ua-Platform,Accept,Sec-Fetch-Site,Sec-Fetch-Mode,Sec-Fetch-Dest,Accept-Encoding,Accept-Language';
my $cookies = 'FastAB,_dd_s,countryCode,geoData,sato,stateCode,umto,usprivacy';
my $cookies_values = 'FastAB=0=6859,1=8174,2=4183,3=3319,4=3917,5=2557,6=4259,7=6070,8=0804,9=6453,10=1942,11=4435,12=4143,13=9445,14=6957,15=8682,16=1885,17=1825,18=3760,19=0929,_dd_s=logs=1&id=b5c2d770-eaba-4847-8202-390c4552ff9a&created=1686159462724&expire=1686160422726,countryCode=US,geoData=purcellville|VA|20132|US|NA|-400|broadband|39.160|-77.700|511,sato=1,stateCode=VA,umto=1,usprivacy=1---';

my $headers_hash = substr(sha256_hex($headers), 0, 12);
my $cookies_hash = substr(sha256_hex($cookies), 0, 12);
my $cv_hash = substr(sha256_hex($cookies_values), 0, 12);

say "headers [$headers_hash]";
say "cookies [$cookies_hash]";
say "cookies_values [$cv_hash]";
headers [974ebe531c03]
cookies [0f2659b474bf]
cookies_values [161698816dab]

Issue 3: the delimiter of the JA4H sections is '_', but underscore characters may appear in header names, cookie names, and cookie values. While the sha256 hash of the original data for sections 2,3, and 4 will never contain an underscore, both the raw and original versions of sections 2,3, and 4 may contain underscores. This complicates parsing the different sections from the JA4H_r and JA4H_o strings. If that's not a design consideration for these versions of the JA4H, then it shouldn't be an issue, but if it's useful to extract the original sections from the JA4H_r and JA4H_o, then it may be helpful to consider a different delimiter or perhaps escaping the delimiter if it may appear in the relevant sections.

@john-althouse john-althouse self-assigned this Mar 1, 2024
@john-althouse john-althouse added the documentation Improvements or additions to documentation label Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants