Skip to content

Commit

Permalink
PWR057: Add Fortran code to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
alvrogd committed Oct 15, 2024
1 parent d08f015 commit c380e3f
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 12 deletions.
64 changes: 57 additions & 7 deletions Checks/PWR057/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ code using accelerators.
### Code example

#### C

Have a look at the following code snippet:

```c
Expand All @@ -46,16 +48,17 @@ void example(double *A, int *nodes, int n) {
}
```
The loop body has a `sparse reduction` pattern, meaning that each iteration of
The loop body has a `sparse reduction` pattern, meaning that each iteration of
the loop *reduces* its computational result to a value, but the place where the
value is stored is known at runtime only. Thus, two different iterations can
potentially update the same element of the array `A`, which creates a potential
race condition that must be handled through appropriate synchronization.
value is stored is known at runtime only. Thus, any two iterations of the loop
executing concurrently can potentially update the same element of the array `A`
at the same time. This creates a potential race condition that must be handled
through appropriate synchronization.
The code snippet below shows an implementation that uses the OpenACC compiler
directives to offload the loop to an accelerator. Note the synchronization added
to avoid race conditions and the data transfer clauses that manage the data
movement between the host memory and the accelerator memory.
directives to offload the loop to an accelerator. Note the synchronization
added to avoid race conditions, while the data transfer clauses manage the data
movement between the host memory and the accelerator memory:
```c
void example(double *A, int *nodes, int n) {
Expand All @@ -69,6 +72,53 @@ void example(double *A, int *nodes, int n) {
}
```

#### Fortran

Have a look at the following code snippet:

```f90
subroutine example(A, nodes)
implicit none
real(kind=8), intent(inout) :: A(:)
integer, intent(in) :: nodes(:)
integer :: nel
do nel = 1, size(nodes, 1)
A(nodes(nel)) = A(nodes(nel)) + (nel * 1)
end do
end subroutine example
```

The loop body has a `sparse reduction` pattern, meaning that each iteration of
the loop *reduces* its computational result to a value, but the place where the
value is stored is known at runtime only. Thus, any two iterations of the loop
executing concurrently can potentially update the same element of the array `A`
at the same time. This creates a potential race condition that must be handled
through appropriate synchronization.

The code snippet below shows an implementation that uses the OpenACC compiler
directives to offload the loop to an accelerator. Note the synchronization
added to avoid race conditions, while the data transfer clauses manage the data
movement between the host memory and the accelerator memory:

```f90
subroutine example(A, nodes)
implicit none
real(kind=8), intent(inout) :: A(:)
integer, intent(in) :: nodes(:)
integer :: nel
!$acc data copyin(nodes) copy(A)
!$acc parallel
!$acc loop
do nel = 1, size(nodes, 1)
A(nodes(nel)) = A(nodes(nel)) + (nel * 1)
end do
!$acc end parallel
!$acc end data
end subroutine example
```

### Related resources

* [PWR057 examples](../PWR057)
Expand Down
9 changes: 4 additions & 5 deletions Checks/PWR057/example-sparse.f90
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
! PWR057: consider applying offloading parallelism to sparse reduction loop

subroutine example(A, nodes, n)
subroutine example(A, nodes)
implicit none
integer, intent(in) :: n
integer, dimension(1:n), intent(in) :: nodes
real(kind=8), dimension(1:n), intent(out) :: A
real(kind=8), intent(inout) :: A(:)
integer, intent(in) :: nodes(:)
integer :: nel

do nel = 1, n
do nel = 1, size(nodes, 1)
A(nodes(nel)) = A(nodes(nel)) + (nel * 1)
end do
end subroutine example

0 comments on commit c380e3f

Please sign in to comment.