-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments for "https://os.phil-opp.com/double-fault-exceptions/" #449
Comments
Awesome stuff, waiting for more :) |
@pbn4 Thank you :) |
This is awesome -- thank you for the amazing work @phil-opp! I had a newbie question: I understand why these handlers are useful in preventing everything from ending up in a restart loop, but differences would an actual implementation have? For example, if a user is running a program that results in a page fault in their shell, the handler must be reporting that back to the application so it can surface it to the user, right? |
@mtn Thank you!
Depends on the OS implementation and the fault type. For example, if the exception is caused because a userspace process tried to execute a privileged instruction, the kernel would simply kill the process (and the shell would report to the user that the process was killed). For a page fault, the kernel can react in multiple ways. If it's just an out of bound access to unmapped memory (like we do in the blog post), the kernel would kill the user program with a segmentation fault. However, most operating systems have a mechanism called swapping, where parts of the memory are moved to disk when the main memory becomes too full. Then a legitimate memory access could cause a page fault because the accessed data is no longer in memory. The OS can handle this page fault by loading the contents of the memory page from disk and continuing the interrupted process. This technique is called demand paging and allows to run programs that wouldn't fit completely into memory. |
Yeah! That you showed for us is what we have learned from OS course. |
Thanks for the post, Phil. I am going to catch up on your posts now that I have completed a B.Sc. in technology. I am going to continue for a master in computer science and your material is very helpful for understanding operating systems. Small typo maybe: You spell it |
Where should I go next after completing your tutorial ? I am neither a beginner nor an expert in Rust but I am interested in OS development and have followed your tutorial thoroughly, I would like to proceed further Thanks for making this series :) |
@siddharthsymphony The OSDev Wiki is one of the best online resources for OS development. If you want more theoretical knowledge, take a look at Modern Operating Systems by Andrew Tanenbaum |
@siddharthsymphony I'm very interesting in System Development, any resources you recommend beside The OSDev Wiki and Modern Operating Systems by Andrew Tanenbaum ? |
@phil-opp I’ve really enjoyed your blog! The way you express the concepts makes it easy for me to follow. Are you still planning on handling interrupts from external devices in next post? |
Awesome stuff. What was your source for learning this? If I wanted to contribute to the next posts while I'm having a go at them, do you have any recommended reading for that? |
@Ben-PH Thanks! I don't have a single source. It's a mix of what I learned at university, the OSDev wiki, Wikipedia, the Intel/AMD manuals, and various other resources. If you're looking for a book about the fundamentional of operating systems, I can recommenend the free Three Easy Pieces. |
Typo: becaues |
The link to the amd64 manual pdf at https://os.phil-opp.com/double-fault-exceptions/#causes-of-double-faults which leads to http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf which is currently a 'Page not found'. There's a version on the wayback machine at: https://web.archive.org/web/20180327184319/http://developer.amd.com/wordpress/media/2012/10/24593_APM_v21.pdf |
General qusetion here about integration testing. There's a lot of code to set up the test fixture that is replicated from non-test code (gdt.rs and interrupts.rs). This feels like it reduces the usefulness of the test because it would only catch problems introduced by changes to both the non-test code /and/ the fixture. Is there any way to reduce replicated code in integration tests like this? |
@gerowam The problem is that we want to do something completely different in our double fault handler ( However, we don't replicate any code from gdt.rs, but directly use |
Why |
@krsoninikhil For "fault" exceptions ( As an example to see where this behavior is useful, consider a page fault exception that occurs because a memory page is swapped out to disk. When the exception occurs, the page fault handler swaps in the page again. It then returns, which automatically restarts the faulting instruction that accessed the page. Since the page is present now, the instruction succeeds now and the program can continue as if no error occurred in between. I hope this helps! |
Ah, that make sense. Thanks Philipp. |
Are there any details from the original fault pushed on the stack or otherwise available to the double fault handler? I'm looking for a way to provide more details in the double fault error message (e.g., IDT[123] was not present) |
Not really. The error code is always zero and even the saved instruction pointer is undefined. If you want more information about the original fault, just add a handler function for it. For example, you can get the "IDT[123] was not present" message by adding a handler for the segment not present exception. This exception pushes a selector error code that tells you which table entry caused the issue. Note that issue rust-lang/rust#57270, which leads to wrong error codes in debug mode, is still open. |
This is my interrupts module // ...
lazy_static! {
static ref IDT: InterruptDescriptorTable = {
let mut idt = InterruptDescriptorTable::new();
idt.breakpoint.set_handler_fn(breakpoint_handler);
// idt.double_fault.set_handler_fn(double_fault_handler);
idt
};
}
pub fn init_idt() {
IDT.load();
}
extern "x86-interrupt" fn breakpoint_handler(stack_frame: &mut InterruptStackFrame) {
println!("EXCEPTION: BREAKPOINT\n{:#?}", stack_frame);
}
// ... and I call the EDIT: So I had to increase the size of the stack and it worked! I feel very stupid right now haha |
@phil-opp Regarding the duplicated code for the IDT integration test: Couldn't you "just" use the panic handler defined in the integration test here
to exit with a success code instead of invoking the standard test panic handler? Since the regular double-fault handler panics, that would be caught by this handler, right? |
@phil-opp : With SSE enabled there are nasty stack alignment issues in interrupt handlers that push an error code. I am trying to understand what is going on but it looks like a compiler bug, and I have seen that you authored https://reviews.llvm.org/D30049 which tried to solve it, but apparently doesn't fix the issue entirely. (I get explosion when calling panic in my double fault handler), which shows that the stack is no longer properly aligned when the call to panic occurs. What is the most appropriate place to report the bug and try to understand what is going on ? |
TBH, for a kernel, you're much better off following the Linux/Unix route and disallowing all FPU/SSE generally. Its not useful for the majority of logic, and maintaining the function call ABI is unnecessary overhead. There are specific areas where SSE/AVX optimised algorithms are a benefit (hashing/crypto code in particular), and they are best suited to know if they can operate with spilling just one register, or whether the operation is so long that using xsave/xrstore is the sensible approach. |
I agree! I prefer a microkernel-like design where almost everything happens in userspace anyway, so the kernel should not do any complex calculations that would profit from SSE/AVX. |
The segmentation fault test is failing if tested as release build |
This combined makes LLVM emit a #[allow(unconditional_recursion)]
pub fn stack_overflow() {
stack_overflow(); // for each recursion, the return address is pushed
} playground::stack_overflow:
retq Inserting a #![feature(core_intrinsics)]
#[allow(unconditional_recursion)]
pub fn stack_overflow() {
stack_overflow();
unsafe { std::intrinsics::volatile_load(&0); }
} playground::stack_overflow: # @playground::stack_overflow
# %bb.0:
pushq %rax
callq *playground::stack_overflow@GOTPCREL(%rip)
movl .L__unnamed_1(%rip), %eax
popq %rax
retq
# -- End function
.L__unnamed_1:
.zero 4 |
Thanks a lot, I was only thinking of completely removing the calls and not of tail-call optimization (really should have known better as Lisp fan), so my version printing the iteration number was still tail-call optimised and didn't grow the stack. This is an awesome series, so polished, I am having a lot of fun with it, thank you so much for all the work you put into it. |
@boris-tschirschwitz It is @phil-opp who created this series, not me. :) |
Great to hear that, thanks a lot :). |
With x86_64 version I downgraded x86_64 version to I also tried
EDIT1: |
@themontem Only a single change happened in version 0.11.1: https://github.com/rust-osdev/x86_64/blob/master/Changelog.md#0111 . It only exports two error types in the API, so there were no behavior-related changes. The printing problem is probably #831. I'm not sure about the triple fault, but normally this indicates that your exception handler itself causes another exception. Can you reproduce the triple fault with the post-06 branch? |
@phil-opp I followed your tutorial which is amazing but at this step I found a weird problem if I write the double fault handler like this the machine doesn't get the tripple fault error
But if I write it like this
I get a thripple fault, any idea why that might happen ? From a basic look I see that the panic macro calls |
@nicolae536 Phew, that was not easy to debug! I was able to reproduce the error and tracked it using GDB through the complete formatting code of the The problem was that the double fault stack overflowed. Since this stack is defined as a normal The fix is simple: Increase the stack size of the double fault stack by adjusting the I will update the post to use a larger stack size. Thanks a lot for reporting this! |
@phil-opp Thanks a lot I also tried to implement what you mentioned here but somehow I kindof stuck because I cannot have a global mut ref to |
@themontem @phil-opp Just got the same error he had (on my screen there was just a "panicked at" without a new line (very weird)), so as it wasn't a triple fault, I thought it could be a problem with the stack, so I increased the size of the stack and it worked just fine. 4096 * 5 seems to work though but how can the panic / print functions can take more than 10kB ? PS. Very good posts though, I'm enjoying your work ! Thank you ! |
Are you building in release mode? Also the formatting machinery is optimized for code size as opposed to runtime performance and stack size. It for example tries hard to prevent inlining. |
@bjorn3 I wasn't building in release mode, nor in optimized mode, which may explain why I only got the "panicked at" ahah, thanks for the explanation! |
@Bari0th Stack overflows are undefined behavior, so everything can happen when they occur. Depending the memory that is overwritten and the values on the stack it can result in a triple fault, some other exception, wrong behavior, silent memory corruption, etc. In your case, the stack overflow happened when it tried to print the file and line information. Apparently it broke the formatting code without causing a triple fault, but even a slight change in your code might change this behavior. Like bjorn3 said, the code is much more optimized when compiling in release mode, so that a smaller stack might suffice.
Thanks! |
Hey @phil-opp thanks again for the great tutorial. About the tests, I understand stack overflows are undefined behavior we should run it using the release mode, but I think we can add the optimization-level to the test profile and run the tests without use release mode. I did some tests on my end and I realize if we add the I added this change to my repo you can check it out if you want https://github.com/ferbass/gat_os/pull/1/files#diff-80398c5faae3c069e4e6aa2ed11b28c0R27 Do you think this is a valid setup to use or should we avoid Thank you in advanced -- |
Hi @phil-opp, I have trouble understanding why a kernel code segment is needed. Adding a little footnote in the blog would be nice I think. Thanks in advance, and keep up the great work! :-) |
In 64-bit mode, segmentation is mostly deactivated, apart from the Privilege level bits. In 16bit and 32 bit mode however, Segmentation is mandatory and correct cs ss and ds segments are usually needed, with correct bits (but often a 0 base address anyway). |
Let me clarify my above comments a bit: Stack overflows on the main kernel stack are not undefined behavior because the bootloader creates a special unmapped page called guard page at the bottom ot this stack. Thus, a stack overflow results in a page fault and no memory is corrupted. The problem is/was that the double fault stack that we create in this post doesn't have such a guard page yet (we will improve this in a future post). Thus, a stack overflow is undefined behavior as it overwrites other data that might still be needed. While compiling with optimizations reduces stack size and can thus avoid these stack overflows in some cases, this is merely a workaround and not a valid solution to the problem. Instead, the double fault stack should still be large enough to work in debug mode too. For this reason I increased the stack size for the double fault stack, so that stack overflows should no longer occur even in debug mode, provided that you keep the double fault handler minimal. It's important to note that this problem is not exclusive to test. It can also occur on a normal execution, e.g. if we accidentally write a function with endless recursion. Since we don't want any undefined behavior in this case, even when running in debug mode, the double fault stack should be large enough for this. So changing the optimization level for tests is not a good solution for this problem because if a test fails in debug mode, a normal
In general, I don't think that changing the test optimization level is problematic. For example, it might be a valid way to speed up a test suite in some cases. However, the program/kernel/etc should still work in debug mode, so optimizing the tests only to avoid some runtime problems is not a good idea. |
@luis-hebendanz As @GuillaumeDIDIER said, segmentation is mostly deactivated in 64-bit mode. The |
Hi @phil-opp, thanks fo this amazing guide! I was wondering why stack size is fixed to
I think it would be nice if you could add a comment about that stack size in the post 😄 |
I think |
I followed the post up to the point where the basic double fault handler is implemented. Adding the I saw that you pushed some changes a few days ago. Might this be related? I am using the version PS. Thanks for this great blog. I learned a lot so far! |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
This is a general purpose comment thread for the “Double Faults” post.
The text was updated successfully, but these errors were encountered: