-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installing Cardinal on Sawtooth #819
Comments
Hi @delcmo - I have very recently built Cardinal on Sawtooth without issue, so am confident we can get to the bottom of this :) What modules are you using for MPI? The recommended modules on the Cardinal website you link are for OpenMPI, but it looks like the error you are seeing is from mvapich. |
You are correct and I made sure to use the same modules as in here. The modules I load are:
I used to load Marco |
Yes, that's certainly possible - I'd try also cleaning out the MOOSE submodule just to be sure we get everything:
|
Ok, thanks for the suggestion. I was able to compile Cardonal and run the
unit tests. Some tests are being skipped and some others fail with an error
message:
runWorker Exception: Traceback (most recent call last): File
"/home/delcmarc/cardinal/contrib/moose/python/TestHarness/schedulers/Scheduler.py",
line 456, in runJob self.queueJobs(jobs, j_lock) File
"/home/delcmarc/cardinal/contrib/moose/python/TestHarness/schedulers/Scheduler.py",
line 257, in queueJobs self.status_pool.apply_async(self.jobStatus, (job,
jobs, j_lock)) File
"/apps/moose/stack/moose-tools-2023.10.19/lib/python3.10/multiprocessing/pool.py",
line 458, in apply_async self._check_running() File
"/apps/moose/stack/moose-tools-2023.10.19/lib/python3.10/multiprocessing/pool.py",
line 353, in _check_running raise ValueError("Pool not running")ValueError:
Pool not running
Is that expected? (Some of the unit tests were skipped the first time I
compiled Cardinal but I cannot remember how many of them are supposed to
fail?
Marco
…On Mon, Dec 4, 2023 at 4:50 PM April Novak ***@***.***> wrote:
Yes, that's certainly possible - I'd try also cleaning out the MOOSE
submodule just to be sure we get everything:
cd cardinal
rm -rf build/ install/
cd contrib/moose
git clean -xfd
cd ../../
make
—
Reply to this email directly, view it on GitHub
<#819 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABD4GCVN7RWJKEXGITXPD2DYHZATXAVCNFSM6AAAAABAGOHTUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGU2DCNRQHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Marc-Olivier Delchini
|
Would you please attach the whole console output?
|
I run the unit test on the login node and also on the queue. The output is
attached.
…On Tue, Dec 5, 2023 at 9:38 AM April Novak ***@***.***> wrote:
Would you please attach the whole console output?
./run_tests > out.txt
—
Reply to this email directly, view it on GitHub
<#819 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABD4GCTZOH6RJMGU7S7EBKDYH4WWPAVCNFSM6AAAAABAGOHTUGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBQHEZDEMJSGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Marc-Olivier Delchini
|
@delcmo I think the attachment did not go through properly - can you please attach it on github, instead of via an email reply? Or you can email it to me directly. |
Here it is. |
Thanks - it looks like some tests are failing due to MPI-related reasons (not normal - something is definitely wrong). Here's one case which fails, looks like all fail in the same way.
Perhaps @loganharbour has an idea? |
Any idea on why I get these odd error messages when running the unit tests? |
Is there some old state in your install from the previous build? Or did you forget to load the relevant modules? That error comes from MPI no longer being in |
Thanks @loganharbour. In that case, @delcmo I'd suggest wiping out Cardinal ( |
@aprilnovak I followed your suggestions and was able to recompile Cardinal and run the unit tests. 5 of them failed:
which seems a more reasonable behavior. |
That looks better! Those are normal - we have a few tests (on the order of 5) which take a long time to run. Depending on the parallel settings you used to launch the test suite, those may time out. NekRS has a very slow JIT process the first time you run a test case. If you re-run the test suite, you should (hopefully) see everything pass because NekRS will be able to use the JIT cache produced on the first test run, saving lots of time on each individual test. |
I re-run it. Only 4 tests failed and one of them TIMEOUT. For the other three, I get a CODE 1 error because of
I updated the I was able to run the tests that are in the documentation https://cardinal.cels.anl.gov/hpc.html. |
** REVISED I would just try the following: |
Bug Description
I am trying to compile Cardinal on Sawtooth and get the following error message with
make -j8
:I did follow the installation instructions on the Cardinal webpage and loaded all modules as instructed. PETSC and LibMesh compiled fine as far as I can tell. I also checked that the path of the files mentioned in the
libtool: warning
messages are all valid.Steps to Reproduce
On Sawtooth using the installation instructions from here.
Impact
I need Cardinal installed on Sawtooth for a project with the NEAMS Workbench.
The text was updated successfully, but these errors were encountered: