Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the option to measure separate timers per thread #3378

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ set( with-models OFF CACHE STRING "The models to include as a semicolon-separate
set( tics_per_ms "1000.0" CACHE STRING "Specify elementary unit of time [default=1000 tics per ms]." )
set( tics_per_step "100" CACHE STRING "Specify resolution [default=100 tics per step]." )
set( with-detailed-timers OFF CACHE STRING "Build with detailed internal time measurements [default=OFF]. Detailed timers can affect the performance." )
set( with-threaded-timers ON CACHE STRING "Build with one internal timer per thread [default=ON]. Multi-threaded timers can affect the performance." )
JanVogelsang marked this conversation as resolved.
Show resolved Hide resolved
set( target-bits-split "standard" CACHE STRING "Split of the 64-bit target neuron identifier type [default='standard']. 'standard' is recommended for most users. If running on more than 262144 MPI processes or more than 512 threads, change to 'hpc'." )

# generic build configuration
Expand Down Expand Up @@ -143,6 +144,7 @@ nest_process_with_gsl()
nest_process_with_openmp()
nest_process_with_mpi()
nest_process_with_detailed_timers()
nest_process_with_threaded_timers()
nest_process_with_libneurosim()
nest_process_with_music()
nest_process_with_sionlib()
Expand Down
7 changes: 7 additions & 0 deletions cmake/ConfigureSummary.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,13 @@ function( NEST_PRINT_CONFIG_SUMMARY )
message( "Detailed timers : No" )
endif ()

message( "" )
if ( THREADED_TIMERS )
message( "Threaded timers : Yes" )
else ()
message( "Threaded timers : No" )
endif ()

JanVogelsang marked this conversation as resolved.
Show resolved Hide resolved
message( "" )
if ( HAVE_MUSIC )
message( "Use MUSIC : Yes (MUSIC ${MUSIC_VERSION})" )
Expand Down
7 changes: 7 additions & 0 deletions cmake/ProcessOptions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,13 @@ function( NEST_PROCESS_WITH_DETAILED_TIMERS )
endif ()
endfunction()

function( NEST_PROCESS_WITH_THREADED_TIMERS )
set( THREADED_TIMERS OFF PARENT_SCOPE )
if ( ${with-threaded-timers} STREQUAL "ON" )
set( THREADED_TIMERS ON PARENT_SCOPE )
endif ()
endfunction()

function( NEST_PROCESS_WITH_LIBNEUROSIM )
# Find libneurosim
set( HAVE_LIBNEUROSIM OFF PARENT_SCOPE )
Expand Down
7 changes: 5 additions & 2 deletions doc/htmldoc/installation/cmake_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ For more details, see the :ref:`Python binding <compile_with_python>` section be
.. _performance_cmake:

Maximize performance, reduce energy consumption
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The following options help to optimize NEST for maximal performance and thus reduced energy consumption.

Expand All @@ -126,7 +126,7 @@ The following options help to optimize NEST for maximal performance and thus red
in place.
* Using ``-march=native`` requires that you build NEST on the same CPU architecture as you will use to run it.
* For the technically minded: Even just using ``-O3`` removes some ``assert()`` statements from NEST since we
have wrapped some of them in functions, which get eliminated due to interprocedural optimization.
have wrapped some of them in functions, which get eliminated due to interprocedural optimization.



Expand Down Expand Up @@ -197,6 +197,9 @@ NEST properties
+-----------------------------------------------+----------------------------------------------------------------+
| ``-Dtics_per_step=[number]`` | Specify resolution [default=100 tics per step]. |
+-----------------------------------------------+----------------------------------------------------------------+
| ``-Dwith-threaded-timers=[OFF|ON]`` | Build with one internal timer per thread [default=ON]. |
| | Multi-threaded timers can affect the performance. |
+-----------------------------------------------+----------------------------------------------------------------+
| ``-Dwith-detailed-timers=[OFF|ON]`` | Build with detailed internal time measurements [default=OFF]. |
| | Detailed timers can affect the performance. |
+-----------------------------------------------+----------------------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion doc/htmldoc/nest_behavior/built-in_timers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ available as kernel attributes:
+--------------------------------+----------------------------------+----------------------------------+
|``time_communicate_target_data``|Cumulative time for core MPI |``time_gather_target_data`` |
| |communication when gathering | |
| |target data | |
| |target data | |
JanVogelsang marked this conversation as resolved.
Show resolved Hide resolved
+--------------------------------+----------------------------------+----------------------------------+
|``time_update`` |Time for neuron update |``time_simulate`` |
+--------------------------------+----------------------------------+----------------------------------+
JanVogelsang marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
1 change: 0 additions & 1 deletion libnestutil/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ set( nestutil_sources
numerics.h numerics.cpp
regula_falsi.h
sort.h
stopwatch.h stopwatch.cpp
string_utils.h
vector_util.h
)
Expand Down
3 changes: 3 additions & 0 deletions libnestutil/config.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,9 @@
/* Whether to enable detailed NEST internal timers */
#cmakedefine TIMER_DETAILED 1

/* Whether to use one NEST internal timer per thread */
#cmakedefine THREADED_TIMERS 1

/* Whether to do full logging */
#cmakedefine ENABLE_FULL_LOGGING 1

Expand Down
1 change: 1 addition & 0 deletions nestkernel/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ set ( nestkernel_sources
stimulation_backend.h
buffer_resize_log.h buffer_resize_log.cpp
nest_extension_interface.h
stopwatch.h stopwatch.cpp
)


Expand Down
2 changes: 2 additions & 0 deletions nestkernel/connection_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -805,7 +805,7 @@
}

void
nest::ConnectionManager::connect_sonata( const DictionaryDatum& graph_specs, const long hyberslab_size )

Check warning on line 808 in nestkernel/connection_manager.cpp

View workflow job for this annotation

GitHub Actions / build_linux (ubuntu-22.04, gcc, openmp, python, gsl, ltdl, boost, optimize, warning)

unused parameter ‘graph_specs’ [-Wunused-parameter]

Check warning on line 808 in nestkernel/connection_manager.cpp

View workflow job for this annotation

GitHub Actions / build_linux (ubuntu-22.04, gcc, openmp, python, gsl, ltdl, boost, optimize, warning)

unused parameter ‘hyberslab_size’ [-Wunused-parameter]
{
#ifdef HAVE_HDF5
SonataConnector sonata_connector( graph_specs, hyberslab_size );
Expand Down Expand Up @@ -1800,7 +1800,9 @@
} // of omp single; implicit barrier

source_table_.collect_compressible_sources( tid );
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp single
{
source_table_.fill_compressed_spike_data( compressed_spike_data_ );
Expand Down
2 changes: 1 addition & 1 deletion nestkernel/connection_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ class ConnectionManager : public ManagerInterface

// public stop watch for benchmarking purposes
// start and stop in high-level connect functions in nestmodule.cpp and nest.cpp
Stopwatch sw_construction_connect;
SingleStopwatch sw_construction_connect;

const std::vector< SpikeData >& get_compressed_spike_data( const synindex syn_id, const size_t idx );

Expand Down
19 changes: 19 additions & 0 deletions nestkernel/event_delivery_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,9 @@ EventDeliveryManager::gather_spike_data_( std::vector< SpikeDataT >& send_buffer
#ifdef TIMER_DETAILED
{
sw_collocate_spike_data_.stop();
#ifdef HAVE_MPI
MPI_Barrier( kernel().mpi_manager.get_communicator() );
#endif
heplesser marked this conversation as resolved.
Show resolved Hide resolved
sw_communicate_spike_data_.start();
}
#endif
Expand Down Expand Up @@ -811,7 +814,9 @@ EventDeliveryManager::gather_target_data( const size_t tid )
resize_send_recv_buffers_target_data();
}
} // of omp master; (no barrier)
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), tid );

kernel().connection_manager.restore_source_table_entry_point( tid );

Expand All @@ -826,12 +831,17 @@ EventDeliveryManager::gather_target_data( const size_t tid )
set_complete_marker_target_data_( assigned_ranks, send_buffer_position );
}
kernel().connection_manager.save_source_table_entry_point( tid );
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), tid );
kernel().connection_manager.clean_source_table( tid );

#pragma omp master
{
#ifdef TIMER_DETAILED
#ifdef HAVE_MPI
MPI_Barrier( kernel().mpi_manager.get_communicator() );
#endif
heplesser marked this conversation as resolved.
Show resolved Hide resolved
sw_communicate_target_data_.start();
#endif
kernel().mpi_manager.communicate_target_data_Alltoall( send_buffer_target_data_, recv_buffer_target_data_ );
Expand Down Expand Up @@ -883,7 +893,9 @@ EventDeliveryManager::gather_target_data_compressed( const size_t tid )
resize_send_recv_buffers_target_data();
}
} // of omp master; no barrier
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), tid );

TargetSendBufferPosition send_buffer_position(
assigned_ranks, kernel().mpi_manager.get_send_recv_count_target_data_per_rank() );
Expand All @@ -898,11 +910,16 @@ EventDeliveryManager::gather_target_data_compressed( const size_t tid )
set_complete_marker_target_data_( assigned_ranks, send_buffer_position );
}

DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), tid );

#pragma omp master
{
#ifdef TIMER_DETAILED
#ifdef HAVE_MPI
MPI_Barrier( kernel().mpi_manager.get_communicator() );
#endif
heplesser marked this conversation as resolved.
Show resolved Hide resolved
sw_communicate_target_data_.start();
#endif
kernel().mpi_manager.communicate_target_data_Alltoall( send_buffer_target_data_, recv_buffer_target_data_ );
Expand All @@ -925,7 +942,9 @@ EventDeliveryManager::gather_target_data_compressed( const size_t tid )
{
buffer_size_target_data_has_changed_ = kernel().mpi_manager.increase_buffer_size_target_data();
} // of omp master (no barrier)
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), tid );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), tid );
}

} // of while
Expand Down
6 changes: 3 additions & 3 deletions nestkernel/event_delivery_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -469,9 +469,9 @@ class EventDeliveryManager : public ManagerInterface
#ifdef TIMER_DETAILED
// private stop watches for benchmarking purposes
// (intended for internal core developers, not for use in the public API)
Stopwatch sw_collocate_spike_data_;
Stopwatch sw_communicate_spike_data_;
Stopwatch sw_communicate_target_data_;
SingleStopwatch sw_collocate_spike_data_;
heplesser marked this conversation as resolved.
Show resolved Hide resolved
SingleStopwatch sw_communicate_spike_data_;
SingleStopwatch sw_communicate_target_data_;
#endif
};

Expand Down
6 changes: 6 additions & 0 deletions nestkernel/kernel_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,12 @@ nest::KernelManager::change_number_of_threads( size_t new_num_threads )
// is in place again, we can tell modules to re-register the components
// they provide.
module_manager.reinitialize_dynamic_modules();

// Prepare timers
kernel().simulation_manager.reset_timers_for_preparation();
kernel().simulation_manager.reset_timers_for_dynamics();
kernel().event_delivery_manager.reset_timers_for_preparation();
kernel().event_delivery_manager.reset_timers_for_dynamics();
heplesser marked this conversation as resolved.
Show resolved Hide resolved
}

void
Expand Down
10 changes: 5 additions & 5 deletions nestkernel/mpi_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,7 @@ nest::MPIManager::time_communicate( int num_bytes, int samples )
std::vector< unsigned int > test_send_buffer( packet_length );
std::vector< unsigned int > test_recv_buffer( packet_length * get_num_processes() );
// start time measurement here
Stopwatch foo;
SingleStopwatch foo;
foo.start();
heplesser marked this conversation as resolved.
Show resolved Hide resolved
for ( int i = 0; i < samples; ++i )
{
Expand Down Expand Up @@ -870,7 +870,7 @@ nest::MPIManager::time_communicatev( int num_bytes, int samples )
}

// start time measurement here
Stopwatch foo;
SingleStopwatch foo;
heplesser marked this conversation as resolved.
Show resolved Hide resolved
foo.start();
for ( int i = 0; i < samples; ++i )
{
Expand Down Expand Up @@ -898,7 +898,7 @@ nest::MPIManager::time_communicate_offgrid( int num_bytes, int samples )
std::vector< OffGridSpike > test_send_buffer( packet_length );
std::vector< OffGridSpike > test_recv_buffer( packet_length * get_num_processes() );
// start time measurement here
Stopwatch foo;
SingleStopwatch foo;
foo.start();
for ( int i = 0; i < samples; ++i )
{
Expand Down Expand Up @@ -932,7 +932,7 @@ nest::MPIManager::time_communicate_alltoall( int num_bytes, int samples )
std::vector< unsigned int > test_send_buffer( total_packet_length );
std::vector< unsigned int > test_recv_buffer( total_packet_length );
// start time measurement here
Stopwatch foo;
SingleStopwatch foo;
heplesser marked this conversation as resolved.
Show resolved Hide resolved
foo.start();
for ( int i = 0; i < samples; ++i )
{
Expand Down Expand Up @@ -969,7 +969,7 @@ nest::MPIManager::time_communicate_alltoallv( int num_bytes, int samples )
}

// start time measurement here
Stopwatch foo;
SingleStopwatch foo;
foo.start();
for ( int i = 0; i < samples; ++i )
{
Expand Down
1 change: 1 addition & 0 deletions nestkernel/nest_names.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -606,6 +606,7 @@ const Name time_deliver_spike_data( "time_deliver_spike_data" );
const Name time_gather_secondary_data( "time_gather_secondary_data" );
const Name time_gather_spike_data( "time_gather_spike_data" );
const Name time_gather_target_data( "time_gather_target_data" );
const Name time_idle( "time_idle" );
heplesser marked this conversation as resolved.
Show resolved Hide resolved
const Name time_in_steps( "time_in_steps" );
const Name time_simulate( "time_simulate" );
const Name time_update( "time_update" );
Expand Down
1 change: 1 addition & 0 deletions nestkernel/nest_names.h
Original file line number Diff line number Diff line change
Expand Up @@ -634,6 +634,7 @@ extern const Name time_deliver_spike_data;
extern const Name time_gather_secondary_data;
extern const Name time_gather_spike_data;
extern const Name time_gather_target_data;
extern const Name time_idle;
extern const Name time_in_steps;
extern const Name time_simulate;
extern const Name time_update;
Expand Down
2 changes: 0 additions & 2 deletions nestkernel/node_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,6 @@
#include "logging.h"

// Includes from nestkernel:
#include "event_delivery_manager.h"
#include "genericmodel.h"
JanVogelsang marked this conversation as resolved.
Show resolved Hide resolved
#include "kernel_manager.h"
#include "model.h"
#include "model_manager_impl.h"
Expand Down
2 changes: 1 addition & 1 deletion nestkernel/node_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,7 @@ class NodeManager : public ManagerInterface
std::vector< std::shared_ptr< WrappedThreadException > > exceptions_raised_;

// private stop watch for benchmarking purposes
Stopwatch sw_construction_create_;
SingleStopwatch sw_construction_create_;
heplesser marked this conversation as resolved.
Show resolved Hide resolved
};

inline size_t
Expand Down
11 changes: 11 additions & 0 deletions nestkernel/per_thread_bool_indicator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -64,40 +64,51 @@ PerThreadBoolIndicator::initialize( const size_t num_threads, const bool status
bool
PerThreadBoolIndicator::all_false() const
{
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
// We need two barriers here to ensure that no thread can continue and change the result
// before all threads have determined the result.
#pragma omp barrier
// We need two barriers here to ensure that no thread can continue and change the result
// before all threads have determined the result.
bool ret = ( are_true_ == 0 );
#pragma omp barrier

DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
return ret;
}

bool
PerThreadBoolIndicator::all_true() const
{
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
#pragma omp barrier
bool ret = ( are_true_ == size_ );
#pragma omp barrier
DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
return ret;
}

bool
PerThreadBoolIndicator::any_false() const
{
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
#pragma omp barrier
bool ret = ( are_true_ < size_ );
#pragma omp barrier

DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
return ret;
}

bool
PerThreadBoolIndicator::any_true() const
{
DETAILED_TIMER_START( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
#pragma omp barrier
bool ret = ( are_true_ > 0 );
#pragma omp barrier

DETAILED_TIMER_STOP( kernel().simulation_manager.get_idle_stopwatch(), kernel().vp_manager.get_thread_id() );
return ret;
}

Expand Down
Loading
Loading