Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long long fix #191

Merged
merged 2 commits into from
Dec 9, 2024
Merged

Long long fix #191

merged 2 commits into from
Dec 9, 2024

Conversation

TomMelt
Copy link
Member

@TomMelt TomMelt commented Nov 18, 2024

closes #183

Problem

This issue also affected windows and is also fixed by this PR.

Tldr; FTorch expects a long int* whilst the libtorch binary seems to be returning a long long int*.

Example output below.

[ 25%] Building CXX object CMakeFiles/ftorch.dir/ctorch.cpp.o
/Users/user/FTorch/src/ctorch.cpp: In function 'const long int* torch_tensor_get_sizes(torch_tensor_t)':
/Users/user/FTorch/src/ctorch.cpp:240:25: error: invalid conversion from 'const long long int*' to 'const long int*' [-fpermissive]
  240 |   return t->sizes().data();
      |          ~~~~~~~~~~~~~~~^~
      |                         |
      |                         const long long int*

Solution

The solution is to check which system we are running on (OS and architecture) and create a preprocess macro -DUNIX that we can use to switch between long long int and long int versions of the libtorch library.

if(UNIX)
message(STATUS "CMAKE_SYSTEM_PROCESSOR = ${CMAKE_SYSTEM_PROCESSOR}")
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "x86_64")
target_compile_definitions(${LIB_NAME} PRIVATE UNIX)
endif()
endif()

Then in the source code we check for that macro e.g.,

FTorch/src/ctorch.h

Lines 128 to 132 in 5fb0887

#ifdef UNIX
EXPORT_C const long int* torch_tensor_get_sizes(const torch_tensor_t tensor);
#else
EXPORT_C const long long int* torch_tensor_get_sizes(const torch_tensor_t tensor);
#endif

TODO

  • works on windows
  • works on linux
  • works on mac (M1/M2)
  • create windows CI (this will be a separate PR) Windows CI build #168

@TomMelt TomMelt self-assigned this Nov 19, 2024
@TomMelt TomMelt force-pushed the long-long-fix branch 3 times, most recently from e491c33 to 5fb0887 Compare November 19, 2024 10:04
@TomMelt TomMelt changed the base branch from main to melt-windows-fix November 19, 2024 10:05
@TomMelt TomMelt added bug Something isn't working enhancement New feature or request hackathon labels Nov 19, 2024
@TomMelt TomMelt requested a review from jwallwork23 November 19, 2024 10:12
@TomMelt TomMelt marked this pull request as ready for review November 19, 2024 10:12
@TomMelt
Copy link
Member Author

TomMelt commented Nov 19, 2024

@jwallwork23 @jatkinson1000 , to make reviewing easier this PR should be merged before #137

add_library(${LIB_NAME} SHARED ctorch.cpp ftorch.F90 ftorch_test_utils.f90)

if(UNIX)
message(STATUS "CMAKE_SYSTEM_PROCESSOR = ${CMAKE_SYSTEM_PROCESSOR}")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message(STATUS "CMAKE_SYSTEM_PROCESSOR = ${CMAKE_SYSTEM_PROCESSOR}")

Just remembered I left this print statement in for debugging. I should take this out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you remove this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet. I will look at fixing the mac build before I take it out completely

Copy link
Contributor

@jwallwork23 jwallwork23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sensible to me, @TomMelt. I haven't tested it on my laptop running Windows but I can do if needed.

@TomMelt
Copy link
Member Author

TomMelt commented Nov 21, 2024

This looks sensible to me, @TomMelt. I haven't tested it on my laptop running Windows but I can do if needed.

Thanks @jwallwork23. Wouldn't hurt if you get time but I did test on windows vm so it should be fine 👌

I do plan on setting up a windows CI soon so that should also catch any issues.

@jatkinson1000
Copy link
Member

It seems that this incompatibility still exists on Mac 12 and 13 - see #192

So the CMake/#ifdef might need some refinement.
IIRC 12 and 13 are X86 images rather than arm, but I'd need to check that.

src/ctorch.h Outdated
Comment on lines 134 to 135
EXPORT_C const long long int *
torch_tensor_get_sizes(const torch_tensor_t tensor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised the linter likes these not being on the same line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly... it makes me sad. I had them on the same line, but the linter forces it to split :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can maybe use a config file to extend the column limit?
https://clang.llvm.org/docs/ClangFormatStyleOptions.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused, when I run clang-format locally it moves them to be placed on one line...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is to do with the default settings being different for different clang-format versions? We observed this in nextSIM-DG and so pinned the version used by the CI.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just opened #199 to here which should resolve this by fixing Cpp Column Limit to 88.

Copy link
Contributor

@jwallwork23 jwallwork23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved once, will approve again. Thanks @TomMelt!

Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in-person, I think this can go in now and the issues with mac can be resolved in a separate set of issues.

Thanks @TomMelt!

fixes [#183](#183)

There is an issue when building on mac (arm_64) or windows. The version
of `libtorch` exposes a torch tensors shape (`t->sizes().data()`) as a
`const long long int*` instead of just a `const long int*` like on linux
and mac (x86).

This commit adds preprocessor macro to switch between implementations
automatically detecting the correct version at CMake build stage.
@jatkinson1000
Copy link
Member

Have rebased #199 on this if you want to merge.

@TomMelt TomMelt merged commit fad331e into melt-windows-fix Dec 9, 2024
4 checks passed
@TomMelt TomMelt deleted the long-long-fix branch December 9, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request hackathon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incompatibility with MacOS arm libtorch
3 participants