Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Dataset importer problems #342

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Merge branch 'main' into patch-11

1126807
Select commit
Loading
Failed to load commit list.
Open

fix Dataset importer problems #342

Merge branch 'main' into patch-11
1126807
Select commit
Loading
Failed to load commit list.
firefoxci-taskcluster / cefilter-ru-en succeeded Jan 29, 2024 in 53m 48s

FirefoxCI (pull_request)

cefilter for ru-en

Details

View task in Taskcluster
View logs in Taskcluster


[taskcluster 2024-01-29 18:35:38.272Z] Task ID: SLgqHJvjQGKbQoGdp706kw
[taskcluster 2024-01-29 18:35:38.272Z] Worker ID: 3035306623046930812
[taskcluster 2024-01-29 18:35:38.272Z] Worker Group: us-central1
[taskcluster 2024-01-29 18:35:38.272Z] Worker Node Type: projects/887720501152/machineTypes/n2-highmem-32
[taskcluster 2024-01-29 18:35:38.272Z] Worker Pool: translations-1/b-linux-large-gcp-300gb
[taskcluster 2024-01-29 18:35:38.272Z] Worker Version: 38.0.5
[taskcluster 2024-01-29 18:35:38.272Z] Public IP: 104.197.197.32
[taskcluster 2024-01-29 18:35:38.273Z] Hostname: translations-1-b-linux-large-gcp-300gb-obdqlfadqwkq2dhyrr9utg
[taskcluster 2024-01-29 18:35:38.273Z] using cache "translations-level-1-checkouts-v3-58974d7dcf0417b3fe53-f9v6Z1KkRZuzZXoqK-Q4rg" -> /builds/worker/checkouts

[taskcluster 2024-01-29 18:35:40.896Z] Downloading artifact "public/image.tar.zst" from task ID: f9v6Z1KkRZuzZXoqK-Q4rg.
[taskcluster 2024-01-29 18:35:44.706Z] Downloaded artifact successfully.
[taskcluster 2024-01-29 18:35:44.706Z] Downloaded 264.083 mb
[taskcluster 2024-01-29 18:35:44.707Z] Decompressing downloaded image
[taskcluster 2024-01-29 18:35:46.154Z] Loading docker image from downloaded archive.
[taskcluster 2024-01-29 18:35:56.153Z] Image 'public/image.tar.zst' from task 'f9v6Z1KkRZuzZXoqK-Q4rg' loaded.  Using image ID sha256:377e5cf268945040e51f09b9261679365b3a09539429ff28b42c40a173e820e8.
[taskcluster 2024-01-29 18:35:56.302Z] === Task Starting ===
[setup 2024-01-29T18:36:07.560Z] run-task started in /builds/worker
[setup 2024-01-29T18:36:07.560Z] Invoked by command: --firefox_translations_training-checkout=/builds/worker/checkouts/vcs/ -- bash -c $VCS_PATH/pipeline/cefilter/ce-filter.sh $MOZ_FETCHES_DIR/corpus /builds/worker/artifacts/corpus $MOZ_FETCHES_DIR/scores.txt
[setup 2024-01-29T18:36:07.560Z] Python version: 3.10.12
[cache 2024-01-29T18:36:07.562Z] cache /builds/worker/checkouts is empty; writing requirements: gid=1000 uid=1000 version=1
[volume 2024-01-29T18:36:07.562Z] volume /builds/worker/checkouts is a cache
[setup 2024-01-29T18:36:07.562Z] running as worker:worker
[vcs 2024-01-29T18:36:07.562Z] executing ['git', 'config', '--global', '--add', 'safe.directory', '/builds/worker/checkouts/vcs']
[vcs 2024-01-29T18:36:07.564Z] executing ['git', 'clone', 'https://github.com/mozilla/firefox-translations-training', '/builds/worker/checkouts/vcs']
[vcs 2024-01-29T18:36:07.565Z] Cloning into '/builds/worker/checkouts/vcs'...
[vcs 2024-01-29T18:36:08.209Z] executing ['git', 'fetch', '--no-tags', 'https://github.com/AmitMY/firefox-translations-training', 'patch-11']
[vcs 2024-01-29T18:36:08.479Z] From https://github.com/AmitMY/firefox-translations-training
[vcs 2024-01-29T18:36:08.479Z]  * branch            patch-11   -> FETCH_HEAD
[vcs 2024-01-29T18:36:08.482Z] executing ['git', 'checkout', '-f', '-B', 'patch-11', '11268076b51421847e9698a5bd754f8a39de9f2c']
[vcs 2024-01-29T18:36:08.532Z] Switched to a new branch 'patch-11'
[vcs 2024-01-29T18:36:08.533Z] executing ['git', 'submodule', 'init']
[vcs 2024-01-29T18:36:08.552Z] Submodule '3rd_party/browsermt-marian-dev' (https://github.com/browsermt/marian-dev) registered for path '3rd_party/browsermt-marian-dev'
[vcs 2024-01-29T18:36:08.553Z] Submodule 'extract-lex' (https://github.com/marian-nmt/extract-lex) registered for path '3rd_party/extract-lex'
[vcs 2024-01-29T18:36:08.553Z] Submodule 'fast_align' (https://github.com/clab/fast_align) registered for path '3rd_party/fast_align'
[vcs 2024-01-29T18:36:08.554Z] Submodule '3rd_party/kenlm' (https://github.com/kpu/kenlm) registered for path '3rd_party/kenlm'
[vcs 2024-01-29T18:36:08.554Z] Submodule '3rd_party/marian-dev' (https://github.com/marian-nmt/marian-dev) registered for path '3rd_party/marian-dev'
[vcs 2024-01-29T18:36:08.555Z] Submodule '3rd_party/preprocess' (https://github.com/kpu/preprocess.git) registered for path '3rd_party/preprocess'
[vcs 2024-01-29T18:36:08.556Z] executing ['git', 'submodule', 'update', '--force']
[vcs 2024-01-29T18:36:08.575Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/browsermt-marian-dev'...
[vcs 2024-01-29T18:36:09.735Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/extract-lex'...
[vcs 2024-01-29T18:36:10.032Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/fast_align'...
[vcs 2024-01-29T18:36:10.348Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/kenlm'...
[vcs 2024-01-29T18:36:10.996Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/marian-dev'...
[vcs 2024-01-29T18:36:12.306Z] Cloning into '/builds/worker/checkouts/vcs/3rd_party/preprocess'...
[vcs 2024-01-29T18:36:12.804Z] Submodule path '3rd_party/browsermt-marian-dev': checked out '11c6ae7c46be21ef96ed10c60f28022fa968939f'
[vcs 2024-01-29T18:36:12.813Z] Submodule path '3rd_party/extract-lex': checked out '42fa605b53f32eaf6c6e0b5677255c21c91b3d49'
[vcs 2024-01-29T18:36:12.822Z] Submodule path '3rd_party/fast_align': checked out 'cab1e9aac8d3bb02ff5ae58218d8d225a039fa11'
[vcs 2024-01-29T18:36:12.845Z] Submodule path '3rd_party/kenlm': checked out 'bbf4fc511266c5d4515047055d7bdec659a6e158'
[vcs 2024-01-29T18:36:12.945Z] Submodule path '3rd_party/marian-dev': checked out 'e8a1a2530fb84cbff7383302ebca393e5875c441'
[vcs 2024-01-29T18:36:12.962Z] Submodule path '3rd_party/preprocess': checked out '64307314b4d5a9a0bd529b5c1036b0710d995eec'
[vcs 2024-01-29T18:36:12.963Z] cleaning git checkout...
[vcs 2024-01-29T18:36:12.963Z] executing ['git', 'clean', '-nxdff']
[vcs 2024-01-29T18:36:12.965Z] removing []
[vcs 2024-01-29T18:36:12.965Z] successfully cleaned git checkout!
[vcs 2024-01-29T18:36:12.967Z] TinderboxPrint:<a href='https://github.com/AmitMY/firefox-translations-training/commit/11268076b51421847e9698a5bd754f8a39de9f2c' title='Built from firefox-translations-training commit 11268076b51421847e9698a5bd754f8a39de9f2c'>11268076b51421847e9698a5bd754f8a39de9f2c</a>
[setup 2024-01-29T18:36:12.967Z] MOZ_FETCHES_DIR is /builds/worker/fetches
[fetches 2024-01-29T18:36:12.967Z] fetching artifacts
[fetches 2024-01-29T18:36:12.967Z] executing ['/usr/bin/python3', '-u', '/usr/local/bin/fetch-content', 'task-artifacts']
attempt 1/5
attempt 1/5Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.en.zst to /builds/worker/fetches/corpus.en.zst
attempt 1/5
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFyB0-MWTnmzis6voVMexw/artifacts/public/build/scores.txt to /builds/worker/fetches/scores.txt
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.ru.zst to /builds/worker/fetches/corpus.ru.zst

Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.en.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.ru.zst
Downloading https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFyB0-MWTnmzis6voVMexw/artifacts/public/build/scores.txt
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/TFyB0-MWTnmzis6voVMexw/artifacts/public/build/scores.txt resolved to 169930 bytes with sha256 f32bb862dc8817406bf26b6a4b08ae3b733e50972c7ad1db077c0e1540ad78e6 in 0.150s
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.en.zst resolved to 99214 bytes with sha256 21be7d6a0a523e213bcf9a38c11a79514d760bd042cd21505252dd7f68fd0c38 in 0.161s
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/fmnwcj_oREqXgdrmsjjGQQ/artifacts/public/build/corpus.ru.zst resolved to 1026256 bytes with sha256 38b136942b59d0619e3a4428cc71828ac6c27f19dccea9867650cc6f0a21c9ca in 0.239s
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"name": "fetch_content", "value": 0.2423400010000023, "lowerIsBetter": true, "shouldAlert": false, "subtests": []}]}
[fetches 2024-01-29T18:36:13.288Z] finished fetching artifacts
[task 2024-01-29T18:36:13.288Z] executing ['bash', '-c', '$VCS_PATH/pipeline/cefilter/ce-filter.sh $MOZ_FETCHES_DIR/corpus /builds/worker/artifacts/corpus $MOZ_FETCHES_DIR/scores.txt']
[task 2024-01-29T18:36:13.290Z] + set -euo pipefail
[task 2024-01-29T18:36:13.290Z] + echo '###### Cross entropy filtering'
[task 2024-01-29T18:36:13.290Z] ###### Cross entropy filtering
[task 2024-01-29T18:36:13.290Z] + test -v SRC
[task 2024-01-29T18:36:13.290Z] + test -v TRG
[task 2024-01-29T18:36:13.290Z] + corpus_prefix=/builds/worker/fetches/corpus
[task 2024-01-29T18:36:13.290Z] + output_prefix=/builds/worker/artifacts/corpus
[task 2024-01-29T18:36:13.290Z] + scores=/builds/worker/fetches/scores.txt
[task 2024-01-29T18:36:13.290Z] + COMPRESSION_CMD=zstdmt
[task 2024-01-29T18:36:13.290Z] + ARTIFACT_EXT=zst
[task 2024-01-29T18:36:13.291Z] ++ dirname /builds/worker/checkouts/vcs/pipeline/cefilter/ce-filter.sh
[task 2024-01-29T18:36:13.291Z] + cd /builds/worker/checkouts/vcs/pipeline/cefilter
[task 2024-01-29T18:36:13.291Z] + remove=0.05
[task 2024-01-29T18:36:13.292Z] ++ dirname /builds/worker/artifacts/corpus
[task 2024-01-29T18:36:13.292Z] + output_dir=/builds/worker/artifacts
[task 2024-01-29T18:36:13.292Z] + tmp=/builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.292Z] + mkdir -p /builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.293Z] + echo '### Sorting scores'
[task 2024-01-29T18:36:13.293Z] ### Sorting scores
[task 2024-01-29T18:36:13.293Z] + '[' '!' -s /builds/worker/artifacts/tmp/sorted.zst ']'
[task 2024-01-29T18:36:13.294Z] ++ bc
[task 2024-01-29T18:36:13.294Z] ++ cut -f1 -d.
[task 2024-01-29T18:36:13.294Z] +++ grep MemTotal /proc/meminfo
[task 2024-01-29T18:36:13.294Z] +++ awk '{print $2}'
[task 2024-01-29T18:36:13.295Z] ++ echo '263936880*0.9'
[task 2024-01-29T18:36:13.296Z] + buffer_size=237543192
[task 2024-01-29T18:36:13.296Z] + LC_ALL=C
[task 2024-01-29T18:36:13.296Z] + sort -n -k1,1 -S 237543192K -T /builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.296Z] + zstdmt
[task 2024-01-29T18:36:13.296Z] + paste /builds/worker/fetches/scores.txt /dev/fd/63 /dev/fd/62
[task 2024-01-29T18:36:13.296Z] ++ zstdmt -dc /builds/worker/fetches/corpus.ru.zst
[task 2024-01-29T18:36:13.297Z] ++ zstdmt -dc /builds/worker/fetches/corpus.en.zst
[task 2024-01-29T18:36:13.424Z] + echo '### Cutting the best scored corpus'
[task 2024-01-29T18:36:13.424Z] ### Cutting the best scored corpus
[task 2024-01-29T18:36:13.424Z] + '[' '!' -s /builds/worker/artifacts/tmp/best.zst ']'
[task 2024-01-29T18:36:13.424Z] ++ zstdmt -dc /builds/worker/artifacts/tmp/sorted.zst
[task 2024-01-29T18:36:13.424Z] ++ wc -l
[task 2024-01-29T18:36:13.438Z] + lines=16993
[task 2024-01-29T18:36:13.438Z] ++ echo '16993*0.05'
[task 2024-01-29T18:36:13.438Z] ++ bc
[task 2024-01-29T18:36:13.438Z] ++ cut -f1 -d.
[task 2024-01-29T18:36:13.439Z] + startline=849
[task 2024-01-29T18:36:13.440Z] + zstdmt -dc /builds/worker/artifacts/tmp/sorted.zst
[task 2024-01-29T18:36:13.440Z] + tail -n +849
[task 2024-01-29T18:36:13.440Z] + cut -f2,3
[task 2024-01-29T18:36:13.440Z] + zstdmt
[task 2024-01-29T18:36:13.538Z] + echo '### Writing output corpus'
[task 2024-01-29T18:36:13.538Z] ### Writing output corpus
[task 2024-01-29T18:36:13.538Z] + zstdmt -dc /builds/worker/artifacts/tmp/best.zst
[task 2024-01-29T18:36:13.538Z] + cut -f2
[task 2024-01-29T18:36:13.538Z] + tee /dev/fd/63
[task 2024-01-29T18:36:13.538Z] + zstdmt
[task 2024-01-29T18:36:13.539Z] ++ cut -f1
[task 2024-01-29T18:36:13.539Z] ++ zstdmt
[task 2024-01-29T18:36:13.624Z] + echo '### Deleting tmp dir'
[task 2024-01-29T18:36:13.624Z] ### Deleting tmp dir
[task 2024-01-29T18:36:13.624Z] + rm -rf /builds/worker/artifacts/tmp
[task 2024-01-29T18:36:13.625Z] + echo '###### Done: Cross entropy filtering'
[task 2024-01-29T18:36:13.625Z] ###### Done: Cross entropy filtering
[fetches 2024-01-29T18:36:13.626Z] removing /builds/worker/fetches
[fetches 2024-01-29T18:36:13.626Z] finished
[taskcluster 2024-01-29 18:36:14.202Z] === Task Finished ===
[taskcluster 2024-01-29 18:36:14.655Z] Successful task run with exit code: 0 completed in 36.384 seconds