Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CreateOutput failed in test case of Xmipp-AngularGraphConsistency #406

Open
jianyingzhu opened this issue Jan 31, 2023 · 8 comments
Open
Assignees

Comments

@jianyingzhu
Copy link

Hi,

I am trying to run a test case of Xmipp-AngularGraphConsistency by running the following command:

./scipion3 tests xmipp3.tests.test_protocol_angular_graph_consistency.TestAngularGraphConsistency

It reports an error in the final step of create output:

run.stdout:

00410:   correlation with projection in Graph max direction: 0.9489219285714285 
00411:   correlation with assigned projection: 0.9584980000000001 
00412:   angular distance to maxGraph: 63.12582661341961
00413:   to be disabled: 70
00414:   FAILED: createOutput, step 5, time 2023-01-11 19:49:47.676809
00415:   *** Last status is failed 
00416:   ------------------- PROTOCOL FAILED (DONE 5/5)

run.stderr:

Traceback (most recent call last):
00826:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 202, in run
00827:       self._run()
00828:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 253, in _run
00829:       resultFiles = self._runFunc()
00830:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 249, in _runFunc
00831:       return self._func(*self._args)
00832:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/xmipp3/protocols/protocol_angular_graph_consistency.py", line 199, in createOutput
00833:       readSetOfParticles(fnOutParticles, self.subsets[i])
00834:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/xmipp3/convert/convert.py", line 1081, in readSetOfParticles
00835:       readSetOfImages(filename, partSet, rowToParticle, **kwargs)
00836:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/xmipp3/convert/convert.py", line 1014, in readSetOfImages
00837:       imgSet.append(img)
00838:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pwem/objects/data.py", line 1160, in append
00839:       EMSet.append(self, image)
00840:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/object.py", line 1245, in append
00841:       self._insertItem(item)
00842:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/object.py", line 1249, in _insertItem
00843:       self._getMapper().insert(item)
00844:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", line 772, in insert
00845:       self.db.insertObject(obj.getObjId(), obj.isEnabled(), obj.getObjLabel(), obj.getObjComment(),
00846:     File "/Share/THUDATA/Softwares/anaconda3/envs/scipion3_2022/lib/python3.8/site-packages/pyworkflow/mapper/sqlite.py", line 1247, in insertObject
00847:       self.executeCommand(self.INSERT_OBJECT, args)
00848:   sqlite3.IntegrityError: UNIQUE constraint failed: Objects.id
00849:   Protocol failed: UNIQUE constraint failed: Objects.id

It seems that all the calculation is done properly, but the program fails to write the output to a sqlite file.

Our databases stored on an lustre, I wonder if this is similar to the problem of SQLite related I/O error when working on an NFS share

Thank you all for your development efforts and help!

@jianyingzhu
Copy link
Author

BTW, the version of scipion-pyworkflow is 3.0.25, the version of scipion-em is 3.0.22, the version of scipion-app is 3.0.11, all of them are not the latest.

The package scipion-pyworkflow is out of date. Your version is 3.0.25, the latest is 3.0.29.
The package scipion-em is out of date. Your version is 3.0.22, the latest is 3.0.24.
The package scipion-app is out of date. Your version is 3.0.11, the latest is 3.0.12

@azazellochg
Copy link
Member

Hi @jianyingzhu, thanks for reporting. I'm not familiar with this protocol but I think the error is very simple - duplicated objids. Nothing wrong with your setup.

@pconesa
Copy link
Contributor

pconesa commented Jan 31, 2023

I've checked the automatic testing server and this test seems to be passing --> http://scipion-test.cnb.csic.es:9980/#/builders/19/builds/253/steps/104/logs/stdio

I've run it locally and also passed too.

As Grigory said, errors does not seem related to your setup. Maybe the test has some random seed causing it to fail in your case?

Does it fail always?

What happens with other tests? Do they run fine?

@jianyingzhu
Copy link
Author

Hi,

I think that maybe the version of sqlite3 package in the pyworkflow of our platform is too low and caused error, so I update scipion-pyworkflow、scipion-em、scipion-app to the latest version by command scipion3 update, and then I passed the test case. I can also run the task successfully on our real data.

I noticed that your test case (http://scipion-test.cnb.csic.es:9980/#/builders/19/builds/253/steps/104/logs/stdio) takes 2236.512 secs (~37min), but my test case taked 2 h 53 min with Threads 1 MPI 4 and another same task by 'copy' taked 2 days 6 h 54 min with Threads 1 MPI 30.

a

Our platform is one GPU nodes with 4 V100 cards, I thought the time consuming is a little bit wired.

@pconesa
Copy link
Contributor

pconesa commented Feb 4, 2023

Yes, seems too slow. I'm not sure if there is a way to have a faster sqlite3 for your system?

If so, the sqlite3 is "Inside" scipion3 conda environment. In case there is a way to replace it.

@pconesa pconesa closed this as completed Feb 4, 2023
@pconesa
Copy link
Contributor

pconesa commented Feb 4, 2023

What is the case for other tests? Is this test something you are interested in?

@pconesa pconesa reopened this Feb 4, 2023
@jianyingzhu
Copy link
Author

Other tests pass successfully.

I am interested in this test because the job failed on my real data and reported the same error. After upgrading, now my job can be run successfully. The time is a little bit long yet within acceptable range.

Thank you very much!

@azazellochg
Copy link
Member

I'm running this test now with xmipp 3.22.04. The speed limiting factor is xmipp_mpi_angular_assignment_mag which runs only on CPU and allocates all available CPU cores (divided between 4 MPIs in this test protocol), despite the description saying it only uses 4 threads by default:

00388:   approx. memory to allocate: 2532 MB
00389:   simultaneous MPI processes: 4
00390:   total available system memory: 128668 MB
00391:   4412 reference images of 60 x 60
00392:   105 exp images of 60 x 60 in this group
00393:   Sampling: 7.08
00394:   Angular step: 3
00395:   Maximum shift: 6
00396:   threads: 4
00397:   ref vol size: 60 x 60 x 60

On my machine the test finished successfully in 1214 sec, no sqlite errors.

I think this one is for xmipp team.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants