Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in thread TaskFileWriter-Thread: #9

Open
zhujack opened this issue Jan 19, 2021 · 7 comments
Open

Exception in thread TaskFileWriter-Thread: #9

zhujack opened this issue Jan 19, 2021 · 7 comments

Comments

@zhujack
Copy link

zhujack commented Jan 19, 2021

Dear author,

Thanks for developing this software. I re-aligned my bam files with the provided reference genome, then ran Accucopy docker image with Singularity following the instructions, but I always get the following errors:

Exception in thread TaskFileWriter-Thread:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
self._writeIfSet()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
self.writeFunc()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
return f(self, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2717, in writeTaskInfo
fp = open(self.taskInfoFile, "a")
IOError: [Errno 2] No such file or directory: 'output/pyflow.data/state/pyflow_tasks_info.txt'
Exception in thread TaskFileWriter-Thread:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
self._writeIfSet()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
self.writeFunc()
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
return f(self, *args, **kw)
File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2618, in writeTaskStatus
tmpFp = open(tmpFile, "w")
IOError: [Errno 2] No such file or directory: 'output/pyflow.data/state/pyflow_tasks_runstate.txt.update.incomplete'

I might have missed something. Any suggestions? Thanks.

Jack

@polyactis
Copy link
Owner

polyactis commented Apr 19, 2021 via email

@pblaney
Copy link

pblaney commented Jul 11, 2022

Hello Yu,

Thanks for the work put into this tool, I was very happy to come across a somatic CNV tool that works well with low coverage samples.
I have recently started to use Accucopy and while my tests were successful (i.e. complete output files and successful log conclusion), however after implementing the tool into a pipeline. I also received the same error as @zhujack


Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
    self._writeIfSet()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
    self.writeFunc()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
    return f(self, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2717, in writeTaskInfo
    fp = open(self.taskInfoFile, "a")
IOError: [Errno 2] No such file or directory: '/gpfs/scratch/blanep01/productionDir/somatic/work/7f/d771b3523042fca16365dbfb4b8894/MMRF_1049_1_BM_CD138pos_T2_KHWGL_L10520_vs_MMRF_1049_1_PB_Whole_C1_KHWGL_L10519_results/pyflow.data/state/pyflow_tasks_info.txt'

I've done my best to trace the bug and the issue seems to be with PyFlow not creating the {OUTPUT_DIR}/pyflow.data/state directory or any of the files needed in this directory i.e. pyflow_tasks_info.txt as referenced in the error message.
There are other errors related to files in this directory not existing.
Furthermore, the error is confusing as the same block of code in the pyflow.py script that is used to create the state directory also creates the {OUTPUT_DIR}/pyflow.data/logs directory which does exist and contains the correct files.

# This is from https://github.com/Illumina/pyflow/blob/master/pyflow/src/pyflow.py
def setupNewRun(self, param) :
        self.param = param

        # setup log file-handle first, then run the rest of parameter validation:
        # (hold this file open so that we can still log if pyflow runs out of filehandles)
        self.param.dataDir = os.path.abspath(self.param.dataDir)
        self.param.dataDir = os.path.join(self.param.dataDir, "pyflow.data")
        logDir = os.path.join(self.param.dataDir, "logs")
        ensureDir(logDir)
        self.flowLogFile = os.path.join(logDir, "pyflow_log.txt")
        self.flowLogFp = open(self.flowLogFile, "a")

        # run remaining validation
        self._validateFixParam(self.param)

        #  initial per-run data
        self.taskErrors = set()  # this set actually contains every task that failed -- tasks contain all of their own error info
        self.isTaskManagerException = False

        # create data directory if it does not exist
        ensureDir(self.param.dataDir)

        # check whether a process already exists:
        self.markFile = os.path.join(self.param.dataDir, "active_pyflow_process.txt")
        if os.path.exists(self.markFile) :
            # Non-conventional logging situation -- another pyflow process is possibly using this same data directory, so we want
            # to log to stderr (even if the user has set isQuiet) and not interfere with the other process's log
            self.flowLogFp = None
            self.param.isQuiet = False
            msg = [ "Can't initialize pyflow run because the data directory appears to be in use by another process.",
                    "\tData directory: '%s'" % (self.param.dataDir),
                    "\tIt is possible that a previous process was abruptly interrupted and did not clean up properly. To determine if this is",
                    "\tthe case, please refer to the file '%s'" % (self.markFile),
                    "\tIf this file refers to a non-running process, delete the file and relaunch pyflow,",
                    "\totherwise, specify a new data directory. At the API-level this can be done with the dataDirRoot option." ]
            self.markFile = None  # this keeps pyflow from deleting this file, as it normally would on exit
            raise DataDirException(msg)
        else :
            mfp = open(self.markFile, "w")
            msg = """
This file provides details of the pyflow instance currently using this data directory.
During normal pyflow run termination (due to job completion, error, SIGINT, etc...),
this file should be deleted. If this file is present it should mean either:
(1) the data directory is still in use by a running workflow
(2) a sudden job failure occurred that prevented normal run termination
The associated pyflow job details are as follows:
"""
            mfp.write(msg + "\n")
            for line in self.getInfoMsg() :
                mfp.write(line + "\n")
            mfp.write("\n")
            mfp.close()

        stateDir = os.path.join(self.param.dataDir, "state")
        ensureDir(stateDir)

Additionally, the issue does not seem to present in Strekla2's output which I found uses an alternative pyflow installation.
Perhaps we could point Accucopy to use this installation of pyflow?

I can expand on the error with more log and output files, just let me which you might need.
I will also follow this issue with an email to you as well.

Best,
Patrick

@fanxinping
Copy link
Collaborator

@pblaney
Hello,
Can you pack and upload the dir {OUTPUT_DIR}/pyflow.data or send it to my email(xinpingfan@gmail.com)?

Which container you have used, Docker or Singularity?

How do you implement Accucopy into a pipeline, wrap it in a Makefile or shell script and submit to SGE?

Best,
Xinping

@pblaney
Copy link

pblaney commented Jul 14, 2022

Hi @fanxinping,

Thank you for getting back to me on this. I have sent you the zipped file of the {OUTPUT_DIR}/pyflow.data.
I am using a Singularity container. It was created by converting the docker image from polyactis/accucopy:latest to a Singularity SIF file.
I have implemented Accucopy into pipeline that executes Accucopy with a shell script and uses SLURM as a job scheduler.

Hope this helps shed light,
Patrick

@fanxinping
Copy link
Collaborator

Hi @pblaney

I have checked the zipped file of the {OUTPUT_DIR}/pyflow.data. The file pyflow.data/logs/pyflow_tasks_stderr_log.txt doesn't contain any error message and shows that Accucopy run successfully. The file pyflow.data/logs/pyflow_tasks_stdout_log.txt contains the complete output of Accucopy, which indicates Accucopy has output the results.

Maybe you have sent the wrong file to me. Can you send the log file which contains the follow message?

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1650, in run
    self._writeIfSet()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 1660, in _writeIfSet
    self.writeFunc()
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 537, in wrapped
    return f(self, *args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pyflow/pyflow.py", line 2717, in writeTaskInfo
    fp = open(self.taskInfoFile, "a")
IOError: [Errno 2] No such file or directory: '/gpfs/scratch/blanep01/productionDir/somatic/work/7f/d771b3523042fca16365dbfb4b8894/MMRF_1049_1_BM_CD138pos_T2_KHWGL_L10520_vs_MMRF_1049_1_PB_Whole_C1_KHWGL_L10519_results/pyflow.data/state/pyflow_tasks_info.txt'

@pblaney
Copy link

pblaney commented Jul 14, 2022

@fanxinping,

I just sent you the log file that captured those errors.
As you note, the Accucopy output is intact and nothing appears to be wrong with its execution. However, this error causes the job to reflect as failed and thus does not propagate the output to the next process.
But as you can see in that zipped directory that there is no state directory which is the cause of the errors.

@fanxinping
Copy link
Collaborator

@pblaney

It's very strange. I have tested Accucopy in my SLURM env, and it worked well.
The shell script is test.sh:

#!/bin/sh

singularity exec --bind /y accucopy_latest.sif /usr/local/Accucopy/main.py -c configure_hg38 -t tumor.bam -n normal.bam -o result --nCores 15

and use this command to submit:

sbatch --cpus-per-task 15 -o test.output test.sh

I think you should check SLURM env or the subsequent process in the shell script. Maybe some operation delete state directory accidentally.

You can share your shell script and submit command if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants