Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions about .fst operation #53

Open
MattCowgill opened this issue Nov 4, 2019 · 4 comments
Open

questions about .fst operation #53

MattCowgill opened this issue Nov 4, 2019 · 4 comments

Comments

@MattCowgill
Copy link
Owner

I don't really like this - if a user supplies a 'tables' argument, read_abs() should return only that table, even if it entails downloading it afresh:

In read_abs(cat_no = "6401.0", tables = 1, retain_files = retain_files,  :
  `tables` was provided, yet `check_local = TRUE` and fst files are available so `tables` will be ignored.
@HughParsonage
Copy link
Collaborator

HughParsonage commented Nov 4, 2019

Agreed that that is too annoying (not to mention faulty!). Sorry!

I remember an important reason for only supporting tables = "all" -- it was because it looked quite difficult to determine, at the time fst::write_fst was executed in read_abs. whether it would be idempotent or not. (For example, consider multiple series_id and multiple tables on first run then a subset of both next time -- is it safe to read?)

Thoughts in the form of design principles:

  1. tables is a core user requirement (rather than for speed or because other functions currently support it). In this scenario, users who run read_abs("6401.0", tables = c(1, 5)) are only interested in those tables (and don't care about tables = 2:4. If this is the case, readabs should store the tables as separate .fst files.

  2. tables is an implementation detail. In this scenario, users need to use tables only to workaround other issues. In this case, we keep the current .fst files structure (one per cat_no) and, as a workaround for this issue, we:
    a. add mutate(sheet, tmp_table_index = ) before fst::write_fst in read_abs.
    b. when reading the .fst file on subsequent runs, we query this field to obtain the requested tables.

At the minimum, I should document the consequences of using tables with check_local = TRUE.

@MattCowgill
Copy link
Owner Author

I think it is more like 1 than 2

@MattCowgill
Copy link
Owner Author

Hi @HughParsonage,
Are you still interested in working on this? No worries if not.

HughParsonage added a commit to HughParsonage/readabs that referenced this issue Jan 16, 2020
HughParsonage added a commit that referenced this issue Jan 16, 2020
@HughParsonage
Copy link
Collaborator

Sorry -- wrote to the wrong repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants