Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the native crash happening on Android. #116

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MohitMaliFtechiz
Copy link
Collaborator

See issue kiwix/kiwix-android#3956

  • The issue on the Android side was a null pointer dereference, meaning that when accessing the getEntryByPath method, the archive object had already been disposed of, which caused the crash.
  • We are correctly handling and disposing of the previous archive and web views for the previously set archive. However, web views load content on their own thread. When we change the ZIM file, we dispose of the previous archive and stop the ongoing web view processes. In certain cases, the web view internally tries to get content via the getEntryByPath method because it takes some time to cancel all the previous calls on the background thread. As a result, the web view attempts to call the method on a disposed archive, which causes the native crash.
  • To fix this, I refactored the getPtr method to return nullptr if the archive has already been disposed of. I also handled this in the utils.h class by checking if the object is null (meaning it has already been disposed). If so, an error is thrown (which should be handled by the caller, in this case, the Kiwix app), instead of crashing the program.
  • Handling this scenario on the Android side is not feasible since the WebView operates on a separate thread over which we have no control. However, addressing this issue on the java-libkiwix side is a better approach. If an object is already disposed, it should throw an error that can be caught by the caller, instead of crashing the entire program. Handling such errors is the responsibility of the user of java-libkiwix (e.g., Kiwix Android). We are already handling all errors thrown by java-libkiwix in the Kiwix Android app.
  • Currently, I have created the SAFE_THIS macro to gather feedback on this approach. Changing the existing THIS macro to incorporate the new behavior would require updating all the places where it is being used. To validate this fix before making those changes, I implemented the new function separately. I explicitly disposed of the archive and attempted to access its content via the getEntryByPath method. Now, instead of crashing the entire program, it correctly throws an error, which is handled by Kiwix.

The issue sometimes occurs on the CI see https://github.com/kiwix/kiwix-android/actions/runs/12809338048/job/35713951704?pr=4177#step:8:9560

* The issue on the Android side was a `null pointer dereference`, meaning that when accessing the getEntryByPath method, the archive object had already been disposed of, which caused the crash.
* We are correctly handling and disposing of the previous archive and web views for the previously set archive. However, web views load content on their own thread. When we change the ZIM file, we dispose of the previous archive and stop the ongoing web view processes. In certain cases, the web view internally tries to get content via the getEntryByPath method because it takes some time to cancel all the previous calls on the background thread. As a result, the web view attempts to call the method on a disposed archive, which causes the native crash.
* To fix this, I refactored the getPtr method to return nullptr if the archive has already been disposed of. I also handled this in the utils.h class by checking if the object is null (meaning it has already been disposed). If so, an error is thrown (which should be handled by the caller, in this case, the Kiwix app), instead of crashing the program.
* Handling this scenario on the Android side is not feasible since the WebView operates on a separate thread over which we have no control. However, addressing this issue on the java-libkiwix side is a better approach. If an object is already disposed, it should throw an error that can be caught by the caller, instead of crashing the entire program. Handling such errors is the responsibility of the user of `java-libkiwix` (e.g., Kiwix Android). We are already handling all errors thrown by `java-libkiwix` in the Kiwix Android app.
* Currently, I have created the `SAFE_THIS` macro to gather feedback on this approach. Changing the existing `THIS` macro to incorporate the new behavior would require updating all the places where it is being used. To validate this fix before making those changes, I implemented the new function separately. I explicitly disposed of the archive and attempted to access its content via the getEntryByPath method. Now, instead of crashing the entire program, it correctly throws an error, which is handled by Kiwix.
@MohitMaliFtechiz
Copy link
Collaborator Author

@kelson42, @mgautierfr Can you please review the PR and validate the idea? then I will refactore the remaining code.

Copy link

codecov bot commented Jan 17, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.23%. Comparing base (4c6dadb) to head (5fd942d).

Additional details and impacted files
@@            Coverage Diff            @@
##               main     #116   +/-   ##
=========================================
  Coverage     93.23%   93.23%           
  Complexity      237      237           
=========================================
  Files            47       47           
  Lines           325      325           
  Branches          3        3           
=========================================
  Hits            303      303           
  Misses           19       19           
  Partials          3        3           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kelson42 kelson42 requested review from veloman-yunkan and mgautierfr and removed request for mgautierfr January 17, 2025 17:37
Comment on lines +31 to +38
({ \
auto ptr = getPtr<NATIVE_TYPE>(env, thisObj); \
if (!ptr) { \
throwException(env, "java/lang/IllegalStateException", "The native object has already been disposed."); \
return nullptr; \
} \
ptr; \
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see how this could work in the pre-C++-14 language standard that I am more or less familiar with. SAFE_THIS has to be an expression but it doesn't seem to expand to code that can be considered an expression according to the language specification. I tried to quickly skim through https://en.cppreference.com/w/cpp/language/expressions but couldn't find anything that allows to consider this code valid in modern c++.

In other words, I am puzzled how this code compiles (in the context of the change in lib/src/main/cpp/libzim/archive.cpp).

Copy link
Collaborator Author

@MohitMaliFtechiz MohitMaliFtechiz Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@veloman-yunkan This code is made by @mgautierfr, he can better explain this, and i am not very much familiar with C++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that @veloman-yunkan understand how it works :)

#define FOO BAR make every instance of FOO literally replaced by BAR.

So something like THIS->getEntryByPath(...) is replaced by getPtr<NATIVE_TYPE>(env, thisObj)->getEntryByPath(...). And it compiles because getPtr is a function returning a pointer.

With your define of SAFE_THIS we have SAFE_THIS->getEntryByPath(...) replaced by:

 ({ 
        auto ptr = getPtr<NATIVE_TYPE>(env, thisObj); 
        if (!ptr) { 
            throwException(env, "java/lang/IllegalStateException", "The native object has already been disposed."); 
            return nullptr; 
        } 
        ptr; 
    })->getEntryByPath(...)

And, as @veloman-yunkan, I don't know how this code can be compiled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your define of SAFE_THIS we have SAFE_THIS->getEntryByPath(...) replaced by:

@mgautierfr SAFE_THIS also uses the getPtr<NATIVE_TYPE>(env, thisObj) to get the pointer, and if this method returns the nullptr it throws the error which will be caught in the archive.cpp.

METHOD(jobject, getEntryByPath__Ljava_lang_String_2, jstring path) {
  return BUILD_WRAPPER("org/kiwix/libzim/Entry", SAFE_THIS->getEntryByPath(TO_C(path)));
} CATCH_EXCEPTION(nullptr)

Also, in the archive.cpp SAFE_THIS returns the archive.

Screenshot from 2025-01-21 17-41-04

I think that's why this code was compiled without any error.

lib/src/main/cpp/utils.h Outdated Show resolved Hide resolved
Copy link
Member

@mgautierfr mgautierfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are correctly handling and disposing of the previous archive and web views for the previously set archive.

I would say that you are not correctly disposing the previous archive if you are disposing it while something is still using it.

However, I understand it can be difficult to do such memory management in a java world where everything is expected to be managed by GC.

I have a clean solution for this problem and it is pr #25. The PR remove the need to call dispose on libzim object (and even remove this method) and make resource automatically clean up when GC destroy the object.

@MohitMaliFtechiz
Copy link
Collaborator Author

I would say that you are not correctly disposing the previous archive if you are disposing it while something is still using it.

@mgautierfr You are right, and I mentioned this thing in the PR description please see point 4(We have done all possible things to remove loading of webView at the Android level).

I have a clean solution for this problem and it is pr #25. The PR remove the need to call dispose on libzim object (and even remove this method) and make resource automatically clean up when GC destroy the object.

Could we complete that PR so we get rid of this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants