fix Dataset importer problems #342
Open
firefoxci-taskcluster / cefilter-ru-en
succeeded
Jan 29, 2024 in 53m 48s
FirefoxCI (pull_request)
cefilter for ru-en
Details
View task in Taskcluster
View logs in Taskcluster
[taskcluster 2024-01-29 18:35:38.272Z] Task ID: SLgqHJvjQGKbQoGdp706kw
[taskcluster 2024-01-29 18:35:38.272Z] Worker ID: 3035306623046930812
[taskcluster 2024-01-29 18:35:38.272Z] Worker Group: us-central1
[taskcluster 2024-01-29 18:35:38.272Z] Worker Node Type: projects/887720501152/machineTypes/n2-highmem-32
[taskcluster 2024-01-29 18:35:38.272Z] Worker Pool: translations-1/b-linux-large-gcp-300gb
[taskcluster 2024-01-29 18:35:38.272Z] Worker Version: 38.0.5
[taskcluster 2024-01-29 18:35:38.272Z] Public IP: 104.197.197.32
[taskcluster 2024-01-29 18:35:38.273Z] Hostname: translations-1-b-linux-large-gcp-300gb-obdqlfadqwkq2dhyrr9utg
[taskcluster 2024-01-29 18:35:38.273Z] using cache "translations-level-1-checkouts-v3-58974d7dcf0417b3fe53-f9v6Z1KkRZuzZXoqK-Q4rg" -> /builds/worker/checkouts
[taskcluster 2024-01-29 18:35:40.896Z] Downloading artifact "public/image.tar.zst" from task ID: f9v6Z1KkRZuzZXoqK-Q4rg.
[taskcluster 2024-01-29 18:35:44.706Z] Downloaded artifact successfully.
[taskcluster 2024-01-29 18:35:44.706Z] Downloaded 264.083 mb
[taskcluster 2024-01-29 18:35:44.707Z] Decompressing downloaded image
[taskcluster 2024-01-29 18:35:46.154Z] Loading docker image from downloaded archive.
[taskcluster 2024-01-29 18:35:56.153Z] Image 'public/image.tar.zst' from task 'f9v6Z1KkRZuzZXoqK-Q4rg' loaded. Using image ID sha256:377e5cf268945040e51f09b9261679365b3a09539429ff28b42c40a173e820e8.
[taskcluster 2024-01-29 18:35:56.302Z] === Task Starting ===
[setup 2024-01-29T18:36:07.560Z] run-task started in /builds/worker
[setup 2024-01-29T18:36:07.560Z] Invoked by command: --firefox_translations_training-checkout=/builds/worker/checkouts/vcs/ -- bash -c $VCS_PATH/pipeline/cefilter/ce-filter.sh $MOZ_FETCHES_DIR/corpus /builds/worker/artifacts/corpus $MOZ_FETCHES_DIR/scores.txt
[setup 2024-01-29T18:36:07.560Z] Python version: 3.10.12
[cache 2024-01-29T18:36:07.562Z] cache /builds/worker/checkouts is empty; writing requirements: gid=1000 uid=1000 version=1
[volume 2024-01-29T18:36:07.562Z] volume /builds/worker/checkouts is a cache
[setup 2024-01-29T18:36:07.562Z] running as worker:worker
[vcs 2024-01-29T18:36:07.562Z] executing ['git', 'config', '--global', '--add', 'safe.directory', '/builds/worker/checkouts/vcs']
[vcs 2024-01-29T18:36:07.564Z] executing ['git', 'clone', 'https://github.com/mozilla/firefox-translations-training', '/builds/worker/checkouts/vcs']
[vcs 2024-01-29T18:36:07.565Z] Cloning into '/builds/worker/checkouts/vcs'...
[vcs 2024-01-29T18:36:08.209Z] executing ['git', 'fetch', '--no-tags', 'https://github.com/AmitMY/firefox-translations-training', 'patch-11']
[vcs 2024-01-29T18:36:08.479Z] From https://github.com/AmitMY/firefox-translations-training
[vcs 2024-01-29T18:36:08.479Z] * branch patch-11 -> FETCH_HEAD
[vcs 2024-01-29T18:36:08.482Z] executing ['git', 'checkout', '-f', '-B', 'patch-11', '11268076b51421847e9698a5bd754f8a39de9f2c']
[vcs 2024-01-29T18:36:08.532Z] Switched to a new branch 'patch-11'
[vcs 2024-01-29T18:36:08.533Z] executing ['git', 'submodule', 'init']
[vcs 2024-01-29T18:36:08.552Z] Submodule '3rd_party/browsermt-marian-dev' (https://github.com/browsermt/marian-dev) registered for path '3rd_party/browsermt-marian-dev'
[vcs 2024-01-29T18:36:08.553Z] Submodule 'extract-lex' (https://github.com/marian-nmt/extract-lex) registered for path '3rd_party/extract-lex'
[vcs 2024-01-29T18:36:08.553Z] Submodule 'fast_align' (https://github.com/clab/fast_align) registered for path '3rd_party/fast_align'
[vcs 2024-01-29T18:36:08.554Z] Submodule '3rd_party/kenlm' (https://github.com/kpu/kenlm) registered for path '3rd_party/kenlm'
[vcs 2024-01-29T18:36:08.554Z] Submodule '3rd_party/marian-dev' (https://github.com/marian-nmt/marian-dev) registered for path '3rd_party/marian-dev'
[vcs 2024-01-29T18:36:08.555Z] Submodule '3rd_party/preprocess' (https://github.com/kpu/preprocess.git) registered for path '3rd_party/preprocess'
[vcs 2024-01-29T18:36:08.556Z] executing ['git', 'submodule', 'update', '--force']
[vcs 2024-01-29T18:36:08.575Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/browsermt-marian-dev'...
[vcs 2024-01-29T18:36:09.735Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/extract-lex'...
[vcs 2024-01-29T18:36:10.032Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/fast_align'...
[vcs 2024-01-29T18:36:10.348Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/kenlm'...
[vcs 2024-01-29T18:36:10.996Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/marian-dev'...
[vcs 2024-01-29T18:36:12.306Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/preprocess'...
[vcs 2024-01-29T18:36:12.804Z] Submodule path '3rd_party/browsermt-marian-dev': checked out '11c6ae7c46be21ef96ed10c60f28022fa968939f'
[vcs 2024-01-29T18:36:12.813Z] Submodule path '3rd_party/extract-lex': checked out '42fa605b53f32eaf6c6e0b5677255c21c91b3d49'
[vcs 2024-01-29T18:36:12.822Z] Submodule path '3rd_party/fast_align': checked out 'cab1e9aac8d3bb02ff5ae58218d8d225a039fa11'
[vcs 2024-01-29T18:36:12.845Z] Submodule path '3rd_party/kenlm': checked out 'bbf4fc511266c5d4515047055d7bdec659a6e158'
[vcs 2024-01-29T18:36:12.945Z] Submodule path '3rd_party/marian-dev': checked out 'e8a1a2530fb84cbff7383302ebca393e5875c441'
[vcs 2024-01-29T18:36:12.962Z] Submodule path '3rd_party/preprocess': checked out '64307314b4d5a9a0bd529b5c1036b0710d995eec'
[vcs 2024-01-29T18:36:12.963Z] cleaning git checkout...
[vcs 2024-01-29T18:36:12.963Z] executing ['git', 'clean', '-nxdff']
[vcs 2024-01-29T18:36:12.965Z] removing []
[vcs 2024-01-29T18:36:12.965Z] successfully cleaned git checkout!
[vcs 2024-01-29T18:36:12.967Z] TinderboxPrint:<a href='https://github.com/AmitMY/firefox-translations-training/commit/11268076b51421847e9698a5bd754f8a39de9f2c' title='Built from firefox-translations-training commit 11268076b51421847e9698a5bd754f8a39de9f2c'>11268076b51421847e9698a5bd754f8a39de9f2c</a>
[setup 2024-01-29T18:36:12.967Z] MOZ_FETCHES_DIR is /builds/worker/fetches
[fetches 2024-01-29T18:36:12.967Z] fetching artifacts
[fetches 2024-01-29T18:36:12.967Z] executing ['/usr/bin/python3', '-u', '/usr/local/bin/fetch-content', 'task-artifacts']
attempt 1/5
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.en.zst to /builds/worker/fetches/corpus.en.zst
attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFyB0-MWTnmzis6voVMexw/artifacts/public/build/scores.txt to /builds/worker/fetches/scores.txt
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.ru.zst to /builds/worker/fetches/corpus.ru.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.en.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.ru.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFyB0-MWTnmzis6voVMexw/artifacts/public/build/scores.txt
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFyB0-MWTnmzis6voVMexw/artifacts/public/build/scores.txt resolved to 169930 bytes with sha256 f32bb862dc8817406bf26b6a4b08ae3b733e50972c7ad1db077c0e1540ad78e6 in 0.150s
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.en.zst resolved to 99214 bytes with sha256 21be7d6a0a523e213bcf9a38c11a79514d760bd042cd21505252dd7f68fd0c38 in 0.161s
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.ru.zst resolved to 1026256 bytes with sha256 38b136942b59d0619e3a4428cc71828ac6c27f19dccea9867650cc6f0a21c9ca in 0.239s
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "fetch_content", "value": 0.2423400010000023, "lowerIsBetter": true, "shouldAlert": false, "subtests": []}]}
[fetches 2024-01-29T18:36:13.288Z] finished fetching artifacts
[task 2024-01-29T18:36:13.288Z] executing ['bash', '-c', '$VCS_PATH/pipeline/cefilter/ce-filter.sh $MOZ_FETCHES_DIR/corpus /builds/worker/artifacts/corpus $MOZ_FETCHES_DIR/scores.txt']
[task 2024-01-29T18:36:13.290Z] + set -euo pipefail
[task 2024-01-29T18:36:13.290Z] + echo '###### Cross entropy filtering'
[task 2024-01-29T18:36:13.290Z] ###### Cross entropy filtering
[task 2024-01-29T18:36:13.290Z] + test -v SRC
[task 2024-01-29T18:36:13.290Z] + test -v TRG
[task 2024-01-29T18:36:13.290Z] + corpus_prefix=/builds/worker/fetches/corpus
[task 2024-01-29T18:36:13.290Z] + output_prefix=/builds/worker/artifacts/corpus
[task 2024-01-29T18:36:13.290Z] + scores=/builds/worker/fetches/scores.txt
[task 2024-01-29T18:36:13.290Z] + COMPRESSION_CMD=zstdmt
[task 2024-01-29T18:36:13.290Z] + ARTIFACT_EXT=zst
[task 2024-01-29T18:36:13.291Z] ++ dirname /builds/worker/checkouts/vcs/pipeline/cefilter/ce-filter.sh
[task 2024-01-29T18:36:13.291Z] + cd /builds/worker/checkouts/vcs/pipeline/cefilter
[task 2024-01-29T18:36:13.291Z] + remove=0.05
[task 2024-01-29T18:36:13.292Z] ++ dirname /builds/worker/artifacts/corpus
[task 2024-01-29T18:36:13.292Z] + output_dir=/builds/worker/artifacts
[task 2024-01-29T18:36:13.292Z] + tmp=/builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.292Z] + mkdir -p /builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.293Z] + echo '### Sorting scores'
[task 2024-01-29T18:36:13.293Z] ### Sorting scores
[task 2024-01-29T18:36:13.293Z] + '[' '!' -s /builds/worker/artifacts/tmp/sorted.zst ']'
[task 2024-01-29T18:36:13.294Z] ++ bc
[task 2024-01-29T18:36:13.294Z] ++ cut -f1 -d.
[task 2024-01-29T18:36:13.294Z] +++ grep MemTotal /proc/meminfo
[task 2024-01-29T18:36:13.294Z] +++ awk '{print $2}'
[task 2024-01-29T18:36:13.295Z] ++ echo '263936880*0.9'
[task 2024-01-29T18:36:13.296Z] + buffer_size=237543192
[task 2024-01-29T18:36:13.296Z] + LC_ALL=C
[task 2024-01-29T18:36:13.296Z] + sort -n -k1,1 -S 237543192K -T /builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.296Z] + zstdmt
[task 2024-01-29T18:36:13.296Z] + paste /builds/worker/fetches/scores.txt /dev/fd/63 /dev/fd/62
[task 2024-01-29T18:36:13.296Z] ++ zstdmt -dc /builds/worker/fetches/corpus.ru.zst
[task 2024-01-29T18:36:13.297Z] ++ zstdmt -dc /builds/worker/fetches/corpus.en.zst
[task 2024-01-29T18:36:13.424Z] + echo '### Cutting the best scored corpus'
[task 2024-01-29T18:36:13.424Z] ### Cutting the best scored corpus
[task 2024-01-29T18:36:13.424Z] + '[' '!' -s /builds/worker/artifacts/tmp/best.zst ']'
[task 2024-01-29T18:36:13.424Z] ++ zstdmt -dc /builds/worker/artifacts/tmp/sorted.zst
[task 2024-01-29T18:36:13.424Z] ++ wc -l
[task 2024-01-29T18:36:13.438Z] + lines=16993
[task 2024-01-29T18:36:13.438Z] ++ echo '16993*0.05'
[task 2024-01-29T18:36:13.438Z] ++ bc
[task 2024-01-29T18:36:13.438Z] ++ cut -f1 -d.
[task 2024-01-29T18:36:13.439Z] + startline=849
[task 2024-01-29T18:36:13.440Z] + zstdmt -dc /builds/worker/artifacts/tmp/sorted.zst
[task 2024-01-29T18:36:13.440Z] + tail -n +849
[task 2024-01-29T18:36:13.440Z] + cut -f2,3
[task 2024-01-29T18:36:13.440Z] + zstdmt
[task 2024-01-29T18:36:13.538Z] + echo '### Writing output corpus'
[task 2024-01-29T18:36:13.538Z] ### Writing output corpus
[task 2024-01-29T18:36:13.538Z] + zstdmt -dc /builds/worker/artifacts/tmp/best.zst
[task 2024-01-29T18:36:13.538Z] + cut -f2
[task 2024-01-29T18:36:13.538Z] + tee /dev/fd/63
[task 2024-01-29T18:36:13.538Z] + zstdmt
[task 2024-01-29T18:36:13.539Z] ++ cut -f1
[task 2024-01-29T18:36:13.539Z] ++ zstdmt
[task 2024-01-29T18:36:13.624Z] + echo '### Deleting tmp dir'
[task 2024-01-29T18:36:13.624Z] ### Deleting tmp dir
[task 2024-01-29T18:36:13.624Z] + rm -rf /builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.625Z] + echo '###### Done: Cross entropy filtering'
[task 2024-01-29T18:36:13.625Z] ###### Done: Cross entropy filtering
[fetches 2024-01-29T18:36:13.626Z] removing /builds/worker/fetches
[fetches 2024-01-29T18:36:13.626Z] finished
[taskcluster 2024-01-29 18:36:14.202Z] === Task Finished ===
[taskcluster 2024-01-29 18:36:14.655Z] Successful task run with exit code: 0 completed in 36.384 seconds
Loading