Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing output files #70

Open
WeiWei060512 opened this issue May 28, 2023 · 5 comments
Open

missing output files #70

WeiWei060512 opened this issue May 28, 2023 · 5 comments

Comments

@WeiWei060512
Copy link

Hi team,

Thanks for sharing this very useful tool. I ran mgatk tenx using my own scATACseq data.
mgatk tenx -i possorted_bam.bam -n mytest -o mytest_mgatk -c 8 -bt CB -b barcodes.tsv

I can only generate 9 files in Final folder (*.A.txt.gz, *.C.txt.gz, *.G.txt.gz, *.T.txt.gz, *.coverage.txt.gz, *.depthTable.txt, **_refAllele.txt, *.rds, .signac.rds), 3 files were missing (.variant_stats.tsv.gz, *.cell_heteroplasmic_df.tsv.gz, *.vmr_strand_plot.png). The code ran successfully without any error message. The variants and heteroplasmies files are the ones I'm interested the most.

Could you let me know how to fix it and generate the full set of outputs, please?

Many thanks,
Wei

@caleblareau
Copy link
Owner

Can you share the version of the software being used? Does it work on the test data in the repository?

@WeiWei060512
Copy link
Author

Hi Caleb,

Thanks for your reply.
Unfortunately, it didn't work on the test data either. I got error message (below) when I ran the test data.

"pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools idxstats: failed to read header for "test_barcode.bam"\n'"

Listed the version of the software in the conda environment to run mgatk.

libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
_r-mutex 1.0.0 anacondar_1
_sysroot_linux-64_curr_repodata_hack 3 haa98f57_10
appdirs 1.4.4 pypi_0 pypi
attrs 23.1.0 pypi_0 pypi
binutils_impl_linux-64 2.38 h2a08ee3_1
binutils_linux-64 2.38.0 hc2dff05_0
biopython 1.81 pypi_0 pypi
blas 1.0 openblas
bwidget 1.9.11 1
bzip2 1.0.8 h7b6447c_0
c-ares 1.19.0 h5eee18b_0
ca-certificates 2023.5.7 hbcca054_0 conda-forge
cairo 1.16.0 hb05425b_4
certifi 2023.5.7 pypi_0 pypi
charset-normalizer 3.1.0 pypi_0 pypi
click 8.1.3 pypi_0 pypi
configargparse 1.5.3 pypi_0 pypi
connection-pool 0.0.3 pypi_0 pypi
curl 7.88.1 h5eee18b_0
cython 0.29.35 pypi_0 pypi
datrie 0.8.2 pypi_0 pypi
docutils 0.20.1 pypi_0 pypi
dpath 2.1.6 pypi_0 pypi
exceptiongroup 1.1.1 pypi_0 pypi
expat 2.4.9 h6a678d5_0
fastjsonschema 2.17.1 pypi_0 pypi
fontconfig 2.14.1 h4c34cd2_2
freetype 2.12.1 h4a9f257_0
fribidi 1.0.10 h7b6447c_0
gcc_impl_linux-64 11.2.0 h1234567_1
gcc_linux-64 11.2.0 h5c386dc_0
gfortran_impl_linux-64 11.2.0 h1234567_1
gfortran_linux-64 11.2.0 hc2dff05_0
gitdb 4.0.10 pypi_0 pypi
gitpython 3.1.31 pypi_0 pypi
glib 2.69.1 he621ea3_2
graphite2 1.3.14 h295c915_1
gxx_impl_linux-64 11.2.0 h1234567_1
gxx_linux-64 11.2.0 hc2dff05_0
harfbuzz 4.3.0 hf52aaf7_1
humanfriendly 10.0 pypi_0 pypi
icu 58.2 he6710b0_3
idna 3.4 pypi_0 pypi
iniconfig 2.0.0 pypi_0 pypi
jinja2 3.1.2 pypi_0 pypi
jpeg 9e h5eee18b_1
jsonschema 4.17.3 pypi_0 pypi
jupyter-core 5.3.0 pypi_0 pypi
kernel-headers_linux-64 3.10.0 h57e8cba_10
krb5 1.19.4 h568e23c_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcurl 7.88.1 h91b91d3_0
libdeflate 1.17 h5eee18b_0
libedit 3.1.20221030 h5eee18b_0
libev 4.33 h7f8727e_1
libffi 3.4.4 h6a678d5_0
libgcc-devel_linux-64 11.2.0 h1234567_1
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libnghttp2 1.46.0 hce63b2e_0
libopenblas 0.3.21 h043d6bf_0
libpng 1.6.39 h5eee18b_0
libssh2 1.10.0 h8f2d780_0
libstdcxx-devel_linux-64 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libtiff 4.5.0 h6a678d5_2
libuuid 1.41.5 h5eee18b_0
libwebp-base 1.2.4 h5eee18b_1
libxcb 1.15 h7f8727e_0
libxml2 2.10.3 hcbfbd50_0
lz4-c 1.9.4 h6a678d5_0
make 4.2.1 h1bed415_1
markupsafe 2.1.2 pypi_0 pypi
mgatk 0.6.7 pypi_0 pypi
nbformat 5.8.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
numpy 1.24.3 pypi_0 pypi
openssl 1.1.1t h7f8727e_0
optparse-pretty 0.1.1 pypi_0 pypi
packaging 23.1 pypi_0 pypi
pandas 2.0.1 pypi_0 pypi
pango 1.50.7 h05da053_0
pcre 8.45 h295c915_0
pcre2 10.37 he7ceb23_1
pip 23.0.1 py39h06a4308_0
pixman 0.40.0 h7f8727e_1
plac 1.3.5 pypi_0 pypi
platformdirs 3.5.1 pypi_0 pypi
pluggy 1.0.0 pypi_0 pypi
psutil 5.9.5 pypi_0 pypi
pulp 2.7.0 pypi_0 pypi
pyrsistent 0.19.3 pypi_0 pypi
pysam 0.21.0 pypi_0 pypi
pytest 7.3.1 pypi_0 pypi
python 3.9.16 h7a1cb2a_2
python-dateutil 2.8.2 pypi_0 pypi
pytz 2023.3 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
r-base 4.2.0 h1ae530e_0
r-data.table 1.14.2 r42h76d94ec_0
r-lattice 0.20_45 r42h76d94ec_0
r-matrix 1.4_1 r42h76d94ec_0
readline 8.2 h5eee18b_0
regex 2023.5.5 pypi_0 pypi
requests 2.31.0 pypi_0 pypi
reretry 0.11.8 pypi_0 pypi
ruamel-yaml 0.17.28 pypi_0 pypi
ruamel-yaml-clib 0.2.7 pypi_0 pypi
setuptools 66.0.0 py39h06a4308_0
six 1.16.0 pypi_0 pypi
smart-open 6.3.0 pypi_0 pypi
smmap 5.0.0 pypi_0 pypi
snakemake 7.26.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
stopit 1.1.2 pypi_0 pypi
sysroot_linux-64 2.17 h57e8cba_10
tabulate 0.9.0 pypi_0 pypi
throttler 1.2.2 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tktable 2.10 h14c3975_0
tomli 2.0.1 pypi_0 pypi
toposort 1.10 pypi_0 pypi
traitlets 5.9.0 pypi_0 pypi
tzdata 2023.3 pypi_0 pypi
urllib3 2.0.2 pypi_0 pypi
wheel 0.38.4 py39h06a4308_0
wrapt 1.15.0 pypi_0 pypi
xz 5.4.2 h5eee18b_0
yte 1.5.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

Many thanks for your help,
Wei

@caleblareau
Copy link
Owner

Hi, just curious if this issue resolved itself? I'm guessing it was a function of a missing R package?

@TomSmithCGAT
Copy link

TomSmithCGAT commented Mar 1, 2024

I am also missing the same three files with mgatk tenx: *.variant_stats.tsv.gz, *.cell_heteroplasmic_df.tsv.gz, *.vmr_strand_plot.png using mgatk v0.7.0.

I can reproduce this behaviour when I run with the test files:
mgatk tenx -i barcode/test_barcode.bam -n bc1 -o bc1dmem -bt CB -b barcode/test_barcodes.txt -c 2

I found the following line in the logfile:
bc1dmem/logs/bc1.snakemake_tenx.log:ModuleNotFoundError: No module named 'matplotlib'

Since I'd installed the dependencies for mgatk with mamba and the used pip install mgatk, my python version was pinned and I couln't update matplotlib from the bioconda channel. Using the conda-forge channel instead though worked. I note that requirements doesn't specify matplotlib

dependencies = ['click', 'pysam', 'pytest', 'snakemake', 'biopython', 'numpy', 'pandas', 'optparse-pretty', 'regex', 'ruamel.yaml']

After getting over the import error I hit

AttributeError: module 'numpy' has no attribute 'float'. I'm using numpy v1.26.4. It looks like this was deprecated in v1.20.

Before I try and work by way around this by updating the source code, can I check if this is something that's has already been resolved in another branch or noted and will be resolved soon, so I don't duplicate work.

@sierranishizaki
Copy link

Thanks @caleblareau for the very helpful tool~

I am also missing these 3 output files referenced in this issue using mgatk v0.7.0 and the command:
mgatk tenx -i ./*.bam -b ./cellrangerarc_out/outs/filtered_feature_bc_matrix/barcodes.tsv -n mgatk_MT -o mgatk_MT_outs -ub UB -bt CB --keep-duplicates --mito-genome ./fasta/genome_MT.fa

This is the error message I am getting in the snakemake log:
MissingOutputException in rule call_variants in file /beegfs/home/snishiz/miniconda3/envs/ssn_mgatk/lib/python3.9/site-packages/m gatk/bin/snake/Snakefile.tenx, line 162: Job 0 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait: mgatk_MT_outs/final/mgatk_MT.variant_stats.tsv.gz mgatk_MT_outs/final/mgatk_MT.cell_heteroplasmic_df.tsv.gz mgatk_MT_outs/final/mgatk_MT.vmr_strand_plot.png

If anyone has a recommendation for getting past this error I would sincerely appreciate it.

Additionally, if someone could share the first few lines of these elusive final 3 files, it would help me determine how important it is for our analysis to continue troubleshooting for these missing files.

Finally, similar to TomSmithCGAT I've had to mess with dependency versioning and currently have pandas v2.2.3 and numpy v1.20.0, if there is recommended versioning for these 2 packages (or a fix for the float error) I would love to know this as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants