Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of Memory issue while reading large no of source files #1752

Open
padesh opened this issue Oct 17, 2024 · 6 comments
Open

Out of Memory issue while reading large no of source files #1752

padesh opened this issue Oct 17, 2024 · 6 comments

Comments

@padesh
Copy link

padesh commented Oct 17, 2024

Hi Team,

I am doing noise cross-correlation simulations for a coupled acoustic-elastic media. When I have large no of external source files (60k files, each with 60k time steps), the step where solver reads the source files, the simulations breaks due to out of memory issue. My question is, are these source files read on a single node or these are read in parallel? If these are read by single node, increasing the no of nodes would not help in this case.

Since almost half of the values are zeros on later part of the time in STF file , can it be done that if NSTEP > no of lines in STF file, the program injects zeros for those extra no of steps? That will help trim down total no of lines in STF file and its size.

or any other suggestions?

STF files are in binary format.

Thanks.

@danielpeter
Copy link
Member

for external source time functions, all MPI processes allocate the same array containing all sources and time steps.

in your case, the size of this array becomes ~ 13 GB:

60000 * 60000 * 4 / 1024. / 1024. / 1024. = 13.411 

the number of time steps in the external STF file must be at least the number of time steps of the simulation. there is no fall back to zeros if it is shorter, instead the simulation would break.

the only help provided is that the solver can read binary files. you could store those files in binary format (***.bin file) to speed up the reading.

@padesh
Copy link
Author

padesh commented Oct 28, 2024

Thanks Peter.
I am already using the .bin format for source files.

This means there is a limit to which you can use timesteps and no of sources, irrespective of no of nodes you can use.

@danielpeter
Copy link
Member

what if you run multiple MPI processes on a single node, just spread out the processes onto more compute nodes and use fewer MPI processes on a single node? 13 GB doesn't sound awfully lot for a single compute node's memory.

@padesh
Copy link
Author

padesh commented Oct 28, 2024

you mean I try allocating nodes ( I have 36 cores of each node) and do task_per_node <36 (say 30) and then call srun xspecfem3D for solver?

@danielpeter
Copy link
Member

yes, find out how much memory you have per node and then estimate how many MPI processes you can run on a single node, taking into account that each process will require additional memory for other arrays (mesh, seismograms, etc.).

@padesh
Copy link
Author

padesh commented Oct 29, 2024

Thanks, let me try this and come back with an update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants