-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CFG extraction timeout not working #106
Comments
TestingTo test if it was slowly making progress or actually stuck, I let the extraction run for about ten hours, but no progress was made after the first 25 minutes. To check where it got stuck, I interrupted with CTRL+C and got the following traceback:
Seems like angr gets in a deadlock when running out of memory pages (?) ResolveSo maybe the aforementioned idea of incrementally saving progress of the extracted features to disk in a temporary folder can free up RAM and allow the complete extraction process to finish. |
TestingI split the dataset in two with the tool in PR #108 into two equal-sized datasets of 200 executables. |
@AlexVanMechelen Please try. Not sure this will fix the issue but worth giving it a try. |
@dhondta This indeed fixes the issue of angr getting into a deadlock. The broader issue of the CFG extraction timeout which sometimes doesn't work still remains, so a small percentage of samples have a significantly longer feature extraction time than others. PS:
|
@AlexVanMechelen OK, here is the explanation ; |
Issue
Sometimes the CFG extraction continues even after the timeout is hit here. The line
Timeout reached when extracting CFG
gets printed to the screen, but Angr keeps extracting the CFG, delaying the CGF-based feature computation for that executable significantly.Reproduce
It's hard to reproduce as there is some randomness to it. It sometimes happens for an executable, but when trying again later with the same executable it stops successfully after extraction.
With the tool in the latest PR #105 I started extracting the CFG-based features for a dataset of 400 samples using 32 CPU cores. The features for the first 300 executables got extracted at a rate of approximately 3 seconds per executable. For the last few executables however, the extraction time skyrockets due to this issue where CFG extraction continues even after the timeout. At this time, after 1Hr40, the features of the last 30 executables are still being extracted.
Resolve
If this issue cannot be resolved directly, maybe it's interesting to create the possibility to save progress on the
dataset convert
command so that the user can halt it early when some executables take a very long time to extract. Allowing the user to continue with the majority of executables for which the features got extracted, or allowing them to relaunch the conversion so that this time the CFG extraction maybe correctly halts at the timeout.The text was updated successfully, but these errors were encountered: