Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bibliography filters display empty rows #104

Open
Maanin00 opened this issue Mar 28, 2023 · 3 comments
Open

Bibliography filters display empty rows #104

Maanin00 opened this issue Mar 28, 2023 · 3 comments
Assignees
Labels
data-error Root cause is error in data help wanted

Comments

@Maanin00
Copy link

Bibligraphy modern, Bibligraphy sites and Bibligraphy sites/samplegroups all contains rows with no text, nothing in the text column just values in the number column. If selected, the filter output shows a error message or displays the spinning search indicator indefinately. Should the rows contain text or are there faulty rows in db?

bild

@roger-mahler
Copy link
Collaborator

Problem is caused by authors and/or title being null. Most of the references are MAL reports.

There are in total 389 references in SEAD associated to MAL data. Only sample group or site references are considered, as given by this query:

with sample_groups as (
	select distinct sg.sample_group_id, sg.site_id
	from tbl_datasets d
	join tbl_dataset_masters dm using (master_set_id)
	join tbl_analysis_entities ae using (dataset_id)
	join tbl_physical_samples ps using (physical_sample_id)
	join tbl_sample_groups sg using (sample_group_id)
	where master_set_id = 2
),
mal_references as (
	select distinct biblio_id
	from sample_groups
	join tbl_sites s using (site_id)
	join tbl_site_references sr using (site_id)
	union
	select distinct biblio_id
	from sample_groups
	join tbl_sample_group_references sgr using (sample_group_id)	
)
	select *
	from tbl_biblio
	join mal_references using (biblio_id)

References in the following tables are ignored:

Ignored:

  • tbl_aggregate_datasets (empty table)
  • tbl_dataset_masters (all biblio_id is null)
  • tbl_datasets (biblio_id is all null for MAL data)
  • tbl_geochron_refs (empty table)
  • tbl_site_other_records (all biblio_id is null)
  • tbl_taxa_synonyms (MAL reference shouldn't occur in taxonomy tree)
  • tbl_taxonomic_order_biblio (MAL reference shouldn't occur in taxonomy tree)
  • tbl_taxonomy_notes (MAL reference shouldn't occur in taxonomy tree)
  • tbl_tephra_refs (empty table)
  • tbl_species_associations
  • tbl_text_biology
  • tbl_text_distribution
  • tbl_text_identification_keys
  • tbl_rdb_systems
  • tbl_relative_age_refs (has no MAL reports)
  • tbl_ecocode_systems (has no MAL reports)
  • tbl_methods (only two methods has a reference, no MAL)

@roger-mahler roger-mahler self-assigned this Jun 3, 2024
@roger-mahler
Copy link
Collaborator

roger-mahler commented Jun 3, 2024

The MAL report register (dated 20240116) contains 1096 entries. Only 28 of these reports exist in SEAD when matching match on full title:

select *
from tbl_biblio
join temp_mal_report_register using (title)

The MAL report register has been imported to sead_staging@staging_cluster.

@roger-mahler roger-mahler added help wanted data-error Root cause is error in data labels Jun 3, 2024
@roger-mahler
Copy link
Collaborator

With some cleaning we get 126 matches:

with sample_groups as (
	select distinct sg.sample_group_id, sg.site_id
	from tbl_datasets d
	join tbl_dataset_masters dm using (master_set_id)
	join tbl_analysis_entities ae using (dataset_id)
	join tbl_physical_samples ps using (physical_sample_id)
	join tbl_sample_groups sg using (sample_group_id)
	where master_set_id = 2
),
sead_mal_references as (
	select distinct biblio_id
	from sample_groups
	join tbl_sites s using (site_id)
	join tbl_site_references sr using (site_id)
	union
	select distinct biblio_id
	from sample_groups
	join tbl_sample_group_references sgr using (sample_group_id)	
)
	select b.biblio_id, b.year, b.title, r.title, regexp_replace(lower(b.title), '\W+', '', 'g') as compressed
	from tbl_biblio b
	join sead_mal_references smr using (biblio_id)
	left join temp_mal_report_register r
	  on regexp_replace(lower(r.title), '\W', '', 'g') = regexp_replace(lower(b.title), '\W', '', 'g')
	where r.title is not null
	order by b.year

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-error Root cause is error in data help wanted
Projects
None yet
Development

No branches or pull requests

3 participants