Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambig tables' structure changes for each update (persistent_assignee_disambig and persistent_inventor_disambig) #99

Open
scottlyden opened this issue Jan 13, 2021 · 2 comments

Comments

@scottlyden
Copy link

Every time the data is updated the persistent_assignee_disambig and persistent_inventor_disambig tables grow an additional column which seems like an unnecessary burden on users and updaters.

In problem domains where you have the luxury of being able to assign permanent IDs to entities, this problem is fairly easy to solve, but I can understand why this problem domain could make that tricky (impossible?).

Maybe the structure of these tables could be made invariant by using columns that could capture the history without changing the schema: some version of current_id, previous_id, start_date, end_date.

Below I sketch a couple of versions that occur to me. Some require greater/lesser code changes, but the code changes could be weighed against greater ease of updating the data and less burden on users.

Thanks for considering.

Probably easiest from end-users' perspective

Requires update of all records in the history of given id

Sacrifices space efficiency and update ease for end-user ease

create table if not exists patentsview_post_grant.toy_table2(
current_id int,
previous_id int NULL,
start_date date,
end_date date NULL
);

insert into
patentsview_post_grant.toy_table3(current_id, previous_id, start_date, end_date)
values
(4, 1, '2000-01-01', '2000-01-02'),
(4, 2, '2000-01-02', '2000-01-03'),
(4, 3, '2000-01-03', '2000-01-04'),
(4, 3, '2000-01-04', null);

Seems fairly easy for end-user

Requires only to update the previous record (and current record)

create table if not exists patentsview_post_grant.toy_table3(
id int NOT NULL,
from_id int NULL,
to_id int NULL,
as_of date NOT NULL
);

insert into
patentsview_post_grant.toy_table3(id, from_id, to_id, as_of)
values
(1, null, 2, '2000-01-01'),
(2, 1, 3, '2000-01-02'),
(3, 2, 4, '2000-01-03'),
(4, 3, null, '2000-01-04');

More space-efficient

Only requires update for current record

Drawback is that you can't tell by inspection that the

last record is the final record

Makes me think the last record should (also) go in a separate table:

ie, have a current table and a history table, although that does

involve the additional step of deleting the previous record from

the 'current' table

create table if not exists patentsview_post_grant.toy_table4(
id int NOT NULL,
from_id int NULL,
as_of date NOT NULL
);

insert into
patentsview_post_grant.toy_table4(id, to_id, as_of)
values
(1, null, '2000-01-01'),
(2, 1, '2000-01-02'),
(3, 2, '2000-01-03'),
(4, 3, '2000-01-04');

More space-efficient

Requires update for current record and previous

Can tell by inspection which record is the final record

create table if not exists patentsview_post_grant.toy_table5(
id int NOT NULL,
to_id int NULL,
as_of date NOT NULL
);

insert into
patentsview_post_grant.toy_table5(id, to_id, as_of)
values
(1, 2, '2000-01-01'),
(2, 3, '2000-01-02'),
(3, 4, '2000-01-03'),
(4, null, '2000-01-04');

@emelluso
Copy link

Hi Scott! Thank you for the suggestion. We are currently reviewing the table set up for ALL of our bulk download files and we will consider your ideas as feedback into that review. When/if we do push a redesign, it will likely be sometime closer to this coming summer.

@scottlyden
Copy link
Author

scottlyden commented Jan 21, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants