Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mwe for fypp preprocessor #60

Merged
merged 20 commits into from
Nov 21, 2023
Merged

add mwe for fypp preprocessor #60

merged 20 commits into from
Nov 21, 2023

Conversation

TomMelt
Copy link
Member

@TomMelt TomMelt commented Nov 3, 2023

This PR introduces a fortran only version of ftorch interface that uses interface function torch_tensor_from_array to create a tensor object from any fortran array.

Instructions to generate pre-processed file:

  1. install fypp
  2. run fypp - fypp ftorch.fypp ftorch.f90

This will replace the existing ftorch.f90. Choose a different name if you want to compare both.

For production ready version we can use the suggestions here to integrate as part of our code base.

  • update docs
  • expand fortran only interface to all examples in repo (maybe copy so we can show both approaches)
  • expand fortran only interface to all examples in ftorch benchmarks?
  • check timings don't get worse
  • check compatibility with gpu version?
  • decide where to put .fypp and .f90 (generated) files (My preference is to keep both in src folder)
  • add example cgdrag showing how to use c-only features (Example for multiple inputs #13)

When this is finished it will close #58

this example generates multiple variants of a function
`torch_tensor_from_array_c_*` depending on data type and rank of input
data.

We should discuss how we actually want to implement this.
@TomMelt TomMelt marked this pull request as draft November 3, 2023 17:08
@TomMelt TomMelt added the enhancement New feature or request label Nov 3, 2023
@TomMelt
Copy link
Member Author

TomMelt commented Nov 6, 2023

I need to make a fortran-only interface with a mwe that removes the need for the end user to import iso_c_bindings

@ElliottKasoar
Copy link
Contributor

(Following earlier discussion with @jatkinson1000 and @TomMelt where I think we concluded that we want to store the processed .f90 files, as well as the .fypp files)

I'm not sure if you had something like this in mind already, but we could potentially do something similar to cp2k, which seems to place the compiled .f90 files in an /obj directory, potentially making the /src directory less cluttered.

(I haven't seen anywhere actually store the pre-processed files with git, but I think the idea still works.)

@jatkinson1000
Copy link
Member

jatkinson1000 commented Nov 6, 2023

Yes, #61 opened to log this proposal.
Since it'll be only one file (at present) I think we will be OK for clutter.

@TomMelt
Copy link
Member Author

TomMelt commented Nov 17, 2023

@jatkinson1000 @ElliottKasoar , ok I think I have done it. I have a fortran only version of ftorch interface that uses interface function torch_tensor_from_array to create a tensor object from any fortran array.

Can you please both take a look? And feel free to try it out on your machines too.

Caveats

  • iso_fortran_env only supports the following types: int8, int16, int32, int64, real32, real64 not the full list of torch enums e.g, torch_kUInt8 and torch_kFloat16. I have therefore not generated overloaded functions for these data types.
  • I have only converted and tested the resnet_infer_fortran.f90 example. We should look at converting the others and testing before I merge this.
  • Also I would like to make sure this doesn't make timings worse 🤞

@TomMelt TomMelt self-assigned this Nov 17, 2023
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @TomMelt.

This is only an interim review as there's a lot to unpack here.
I thought I'd voice things in stages as I work through them.

So far I have:

  • Compiled and built this version of FTorch on M2 Mac (gfortran, Apple clang)
  • Build and run exercise 2 (Resnet) using your updated code and this version of the library
  • Generated FORD docs for this updated version of the libaray

Observations:

  • Documentation is now very messy - I'm looking if FORD can generate interface docs.
  • I would suggest we rename the functions from e.g. torch_tensor_from_array_int16_1 to torch_tensor_from_array_int16_1d as I think it makes it clearer what the suffix is.
  • Documentation needs updating with fypp install/run instructions - moved to Add Long-form API documentation #53 (see below)
  • We discussed a git hook to generate the ftorch

Other comments:

  • I agree with not supporting non-Fortran types, but we should note this in longer docs
  • I agree with fypp and f90 being alongside one another in src/
  • Perhaps not for this PR, but we may consider a short explanation of the sp precision syntax.

I'll start a closer look at the code now and make a few additions as suitable.

@jatkinson1000
Copy link
Member

Having been through and generated a pre-commit hook to check fypp is generated accordingly I realised I forgot that pre-commit hooks cannot be shared between users or stored as part of the git repo. Therefore I will add an action to check instead, and instructions for developers to add the pre-commit hook manually.

@jatkinson1000 jatkinson1000 linked an issue Nov 18, 2023 that may be closed by this pull request
@jatkinson1000
Copy link
Member

Now happy with the workflow.
Slightly watered down from the original proposal, but the best we can do with GitHub I think.
It compares the result of fypp generated code to the existing ftorch.f90, and fails if they do not match.

This does not stop people from pushing mismatching code or modifying ftorch.f90 directly, but such a situation should become apparent when conducting a code review and this check will ensure the pair of files match as a first line of defense.

I have included a more powerful pre-commit hook in the main repo (could move elsewhere?) that will enforce these things for developers. I am happy (with someone else checking) to say this now closes #61

@jatkinson1000
Copy link
Member

Updated the README such that merging this will also close #57
Would appreciate a second pair of eyes on this.

This will need amending further if we make a decision about simplifying the indexing as well as per #56

@jatkinson1000 jatkinson1000 linked an issue Nov 18, 2023 that may be closed by this pull request
@jatkinson1000
Copy link
Member

Since fypp is not a required dependency to run I suggest we move description of how to install this (and use the git hook) to a 'developers'' section of the documentation in #53

@jatkinson1000
Copy link
Member

jatkinson1000 commented Nov 18, 2023

I'm going to pause here for now, but it would be great if we can get this merged in time for the talk!
Doing so I think requires:

  • Updating examples (I agree with preserving one to demonstrate using blob - perhaps use of of the MiMA examples where the striding is actually beneficial?)
  • Updating benchmarks and timing checks (@ElliottKasoar)
  • Checking GPU functionality (@ElliottKasoar)

We should also consider if we want to make the strides an optional argument (#56), but given the size of this PR this should perhaps be done as another feature. Changing them to be optional should be possible in a backwards compatible way I'd hope.

Once this is clear I'd like to complete #53 in some form, maybe #48, and try to get some MiMA numbers.
And sort some slides I guess...

Copy link
Contributor

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't entirely newly introduced by these changes (e.g. torch_tensor_from_blob vs torch_from_blob_C) but something to consider is adding a bit more consistency between the order of parameters.

For example, layout and device are switched changing from torch_tensor_from_blob(data, ndims, tensor_shape, dtype, device, layout) to torch_tensor_from_array(data_in, layout, c_device).

Particularly for integer arguments, this could be easy to miss.

@jatkinson1000
Copy link
Member

Great point, and I agree @ElliottKasoar

@ElliottKasoar
Copy link
Contributor

ElliottKasoar commented Nov 21, 2023

I'm going to pause here for now, but it would be great if we can get this merged in time for the talk! Doing so I think requires:

* [ ]  Updating benchmarks and timing checks (@ElliottKasoar)

* [ ]  Checking GPU functionality (@ElliottKasoar)

We should also consider if we want to make the strides an optional argument (#56), but given the size of this PR this should perhaps be done as another feature. Changing them to be optional should be possible in a backwards compatible way I'd hope.

Once this is clear I'd like to complete #53 in some form, maybe #48, and try to get some MiMA numbers. And sort some slides I guess...

Changes tested on CSD3 GPUs, and no apparent change in timings seen on CPU or GPU.

My benchmarking PR will also hopefully have an example of cgdrag with both from_array (explicit reshaping) and from_blob (implicit reshaping), so that can potentially be adapted/used to inform new examples.

@jatkinson1000 jatkinson1000 self-requested a review November 21, 2023 12:43
Copy link
Member

@jatkinson1000 jatkinson1000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now happy with this, but agree we should standardise inputs to functions.
@ElliottKasoar @TomMelt How do we feel about the following precedence order:

  • input data
  • device
  • layout
  • ndims
  • tensor_shape
  • dtype

This keeps required first, and anticipates future development when layout may become optional.

I'd also suggest the 'c-only-example' becomes its own pull request and we merge this once the above is done.

@ElliottKasoar
Copy link
Contributor

ElliottKasoar commented Nov 21, 2023

I'm now happy with this, but agree we should standardise inputs to functions. @ElliottKasoar @TomMelt How do we feel about the following precedence order:

  • input data
  • device
  • layout
  • ndims
  • tensor_shape
  • dtype

This keeps required first, and anticipates future development when layout may become optional.

I'd also suggest the 'c-only-example' becomes its own pull request and we merge this once the above is done.

I think @TomMelt mentioned to me earlier deliberately putting device at the end, as it's a bit more detached from specification of the tensor (please correct me if I'm misquoting), but I think I agree with @jatkinson1000 that the more optional the argument, the lower the precedence, probably makes more sense.

Also agree with having the new example as its own PR.

@jatkinson1000
Copy link
Member

OK, that makes sense.
I have standardised, so how does that look now?
Kept device at the end which I think makes sense (this is also what Torch does).

Copy link
Contributor

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this, but is there a reason to keep test_tensor.f90 still?

I would maybe either update the (two) instances of torch_tensor_from_blob or delete it, but otherwise looks great!

pre-commit Outdated Show resolved Hide resolved
pre-commit Outdated Show resolved Hide resolved
@jatkinson1000 jatkinson1000 marked this pull request as ready for review November 21, 2023 17:15
Copy link
Contributor

@ElliottKasoar ElliottKasoar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the .githooks directory was added in your last commit @jatkinson1000.

@jatkinson1000
Copy link
Member

I'm not sure the .githooks directory was added in your last commit @jatkinson1000.

Uuurrgghhhhhh I just used mv instead of git mv.
Thank you.
Should be sorted now.

@TomMelt
Copy link
Member Author

TomMelt commented Nov 21, 2023

OK, that makes sense. I have standardised, so how does that look now? Kept device at the end which I think makes sense (this is also what Torch does).

I think we need the following order for torch_tensor_from_blob:

  • input data
  • ndims
  • tensor_shape
  • layout/strides
  • dtype
  • device

This will then match the order of args in torch_from_blob_c

function torch_from_blob_c(data, ndims, tensor_shape, strides, dtype, device) result(tensor_p)

and those in torch_tensor_from_array

function torch_tensor_from_array(data_in, layout, c_device) result(tensor)

What do you think @jatkinson1000 ? If you are happy I have a commit waiting to be pushed that fixes it

@jatkinson1000
Copy link
Member

That sounds sensible to me!

@TomMelt TomMelt merged commit 530ab25 into main Nov 21, 2023
2 checks passed
@TomMelt TomMelt deleted the mwe-fypp-preprocessor branch November 21, 2023 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add action and hook for fypp Triage torch_tensor_from_array Incorrect example in README
3 participants