-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: OLE CF and VBA modules implemented #285
base: main
Are you sure you want to change the base?
Conversation
- Added support for parsing OLE CF and VBA (macro-enabled Office) files.
- Addressed infinite loop issue in OLE CF parser.
lib/src/modules/olecf/parser.rs
Outdated
while current < MAX_REGULAR_SECTOR { | ||
chain.push(current); | ||
let next = match self.get_fat_entry(current) { | ||
Ok(n) => n, | ||
Err(_) => break, | ||
}; | ||
if next >= MAX_REGULAR_SECTOR || next == FREESECT || next == ENDOFCHAIN { | ||
break; | ||
} | ||
current = next; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fuzzer found an out-of-memory (OOM) issue caused by this loop. The loop can be infinite while parsing some files, making the chain
vector to grow forever.
I'm attaching a file that reproduces the issue (it must be unzipped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, fixed with this change that's committed (confirmed it correctly processes the attached file, whereas previously it didn't):
while current < MAX_REGULAR_SECTOR {
// Prevent cycles by keeping track of visited sectors
if chain.contains(¤t) {
// We've seen this sector before - it's a cycle
break;
}
chain.push(current);
let next = match self.get_fat_entry(current) {
Ok(n) => n,
Err(_) => break,
};
// Check validity of next sector
if next >= MAX_REGULAR_SECTOR || next == FREESECT || next == ENDOFCHAIN {
break;
}
current = next;
}
Thank you for addressing the other changes, too.
} | ||
|
||
fn get_regular_stream_data(&self, start_sector: u32, size: u64) -> Result<Vec<u8>, &'static str> { | ||
let mut data = Vec::with_capacity(size as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fuzzer found another case in which the parser consumes too much memory due to unsanitized input. Particularly, the size passed to this call of Vec::with_capacity
comes directly from the file, without any sanitaztion. It can be as large as 4GB.
This is a file that can reproduce the issue.
oom-7c7ec1f9736aed84501f8152250a8431e55917c7.zip
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I highly recommend you to run the fuzzer in your side for a while to find issues in the parser. The fuzzer usually does a great a job at finding bugs. You can follow these steps:
rustup default nightly
cd lib/fuzz
cargo fuzz run vba_parser
OLE CF (Object Linking and Embedding Compound File) format is a file format used for legacy Microsoft Office files, such as documents, workbooks, presentations, and others. It's also used with Visual Basic for Applications (VBA) which is known more commonly as Office macros.
I've implemented two modules for parsing OLE CF files and VBA. I've also expanded the
dump
command to allow dumping of file metadata such as stream metadata (OLE CF) and macros (VBA).An example of how you could use this to identify the use of auto-execute macro method names like "Document_New":