Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fails to convert #113

Open
derhuerst opened this issue Feb 22, 2021 · 9 comments
Open

fails to convert #113

derhuerst opened this issue Feb 22, 2021 · 9 comments

Comments

@derhuerst
Copy link
Contributor

I tried to convert the 2021-02-12 VBB GTFS feed.

npm init --yes
npm i gtfs2lc -D
wget -r --no-parent --no-directories -P gtfs -N 'https://vbb-gtfs.jannisr.de/2021-02-12/'
# rename all .csv to #.txt …
env NODE_ENV=production gtfs2lc gtfs -f jsonld | head -n 3
# GTFS to linked connections converter use --help to discover more functions
# Indexing of stops, services, routes and trips completed successfully!
# Created worker thread (PID 1)
# Created worker thread (PID 2)
# Created worker thread (PID 3)
# Created worker thread (PID 4)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_0.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_0.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_1.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_1.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_2.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_2.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_3.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_3.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)

It has also created 4 files inside gtfs:

ls -l gtfs
# -rw-r--r--@  1 j  staff       3537 Feb 22 01:10 agency.txt
# -rw-r--r--   1 j  staff      79382 Feb 22 01:14 calendar.txt
# -rw-r--r--   1 j  staff     859354 Feb 22 01:14 calendar_dates.txt
# -rw-r--r--@  1 j  staff         64 Feb 22 01:10 frequencies.txt
# -rw-r--r--@  1 j  staff        140 Feb 22 01:10 pathways.txt
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_0.json
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_1.json
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_2.json
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_3.json
# -rw-r--r--@  1 j  staff      48812 Feb 22 01:10 routes.txt
# -rw-r--r--@  1 j  staff  143590907 Feb 22 01:10 shapes.txt
# -rw-r--r--   1 j  staff  269753688 Feb 22 01:14 stop_times.txt
# -rw-r--r--@  1 j  staff    4723089 Feb 22 01:10 stops.txt
# -rw-r--r--@  1 j  staff    4200935 Feb 22 01:10 transfers.txt
# -rw-r--r--@  1 j  staff   14019736 Feb 22 01:10 trips.txt
@derhuerst
Copy link
Contributor Author

Also, I noticed that the test npm script is failing silently. ./bin/gtfs2lc.js -s -f jsonld test/sample-feed writes just 4 newlines into test/sample-feed/linkedConnections.json.

@derhuerst derhuerst mentioned this issue Feb 23, 2021
@julianrojas87
Copy link
Contributor

I think you missed step 3. This is a critical step since the rest of the process depends on the connections.txt file(s) that are created here and the sorting applied to the other files.

I was able to convert the data source you link above following these steps:

  1. Clone gtfs2lc and run npm i. You may also opt for installing it globally.
  2. Get the data source using wget -r --no-parent --no-directories -P gtfs -N 'https://vbb-gtfs.jannisr.de/2021-02-12/'
  3. Rename all .csv files to .txt (we should also allow it to work with csv).
  4. Important! Run the sorting pre-processing step: ./bin/gtfs2lc-sort.sh /path/to/source/gtfs (run inside the gtfs2lc folder)
  5. Execute gtfs2lc main process. I used the following command:
node bin/gtfs2lc.js -f jsonld -o path/to/output/folder /path/to/source/gtfs

Note: The period covered by this GTFS source is quite long (from 2021-02-11 until 2021-12-11) and given its high amount of trips, it will result in a very big Linked Connections file. I didn't have enough disk space to completely process it, so for testing purposes I changed its coverage period (in calendar.txt) until 2021-02-20.

As for the failing tests you mentioned, I think it is because you are also lacking the sorting pre-processing there. If you check the testing command in package.json you will see that it performs this step first. Running the tests using the npm command npm test completes without errors.

Please let me know if this solves the issue :)

@derhuerst
Copy link
Contributor Author

Thanks for trying this out!

I think you missed step 3.

I didn't pay enough attention to the log output. It fails (without aborting!) on the sort command, then subsequent steps don't work.

$ ../gtfs2lc/bin/gtfs2lc-sort.sh gtfs
# Converting newlines dos2unix
# Removing UTF-8 artifacts in directory gtfs
# sort: -k d,: Invalid argument
# Creating connection files according to the number of CPU processors available
# Converted 0 stop_times to connections
# Sorting files in directory gtfs
$ ls -l $(which sort)
# -rwxr-xr-x  1 root  wheel  74912 Sep 22 02:30 /usr/bin/sort
$ man sort
#
# SORT(1)                   BSD General Commands Manual                  SORT(1)
#
# NAME
#      sort -- sort or merge records (lines) of text and binary files
#
# SYNOPSIS
#      sort [-bcCdfghiRMmnrsuVz] [-k field1[,field2]] [-S memsize] [-T dir] [-t char] [-o output] [file ...]
#      sort --help
#      sort --version
#
# DESCRIPTION

Could it be that your sort executable is different?

Also, in #115, I have added a set -e, so that the script aborts on errors.

@julianrojas87
Copy link
Contributor

That is very strange. This is the version of sort I am using:

$ sort --version
sort (GNU coreutils) 8.30
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

@derhuerst
Copy link
Contributor Author

you have GNU sort, i have the BSD sort. almost always, an up-to-date GNU CLI has more features than the macOS BSD one. same with time, find, grep, etc.

if you're on a gnu/linux system, it's because then the gnu variants are canonical. if you're on macOS, it's probably because you have installed sort via homebrew.

can you run ls -l $(which sort)?

@julianrojas87
Copy link
Contributor

So is a problem with macOS sort command. Not sure what this tells you but here it is:

$ ls -l $(which sort)
-rwxr-xr-x 1 root root 117376 Sep  5  2019 /usr/bin/sort

@derhuerst
Copy link
Contributor Author

I'll run with set -x to find out what the actual sort calls are.

@derhuerst
Copy link
Contributor Author

Unfortunately, the current sort-based CSV sorting messes things up silently.

$ head -n 1 gtfs/stop_times.txt
# "tip_id","arrival_time","departure_time","stop_id","stop_sequence","pickup_type","drop_off_type","stop_headsign"
$ head -n 1 gtfs/trips.txt
# "oute_id","service_id","trip_id","trip_headsign","trip_short_name","direction_id","block_id","shape_id","wheelchair_accessible","bikes_allowed"

You could use Miller & sponge to sort the CSV files:

mlr --csv sort -f trip_id trips.csv | sponge trips.csv
mlr --csv sort -f trip_id -n stop_sequence stop_times.csv | sponge trips.csv
mlr --csv sort -f service_id calendar.csv | sponge trips.csv
mlr --csv sort -f service_id,date calendar_dates.csv | sponge trips.csv

It would also take care of the UTF-8 Byte Order Mark (BOM) & non-Unix-newlines in one pass, and prevent the current race conditions with mv &.

@derhuerst
Copy link
Contributor Author

derhuerst commented Dec 31, 2021

I also tried running with gsort (the Homebrew-installed GNU sort variant), and it doesn't work either, using the 2021-12-17 VBB GTFS:

gsort --version
# sort (GNU coreutils) 9.0
# Copyright (C) 2021 Free Software Foundation, Inc.
# License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
# This is free software: you are free to change and redistribute it.
# There is NO WARRANTY, to the extent permitted by law.

# Written by Mike Haertel and Paul Eggert.

/bin/gtfs2lc-sort.sh gtfs
# Converting newlines dos2unix
# Removing UTF-8 artifacts in directory gtfs
# Trimming EOLs and removing continuous double quotes
# sort: -k d,: Invalid argument
# Creating connection files according to the number of CPU processors available
# Converted 0 stop_times to connections
# Sorting files in directory gtfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants