Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add quoted search functionality on browse sources page #1737

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

lucasmarchd01
Copy link
Contributor

Fixed some bugs in the search functionality on the Browse Sources page by adding support for quoted phrase matching, allowing exact matches for phrases (e.g., "North Vancouver").

Enabled unaccented search terms to match fields with or without accents, applying both partial (icontains) and exact (iexact) matching logic across key fields like institution name, city, shelfmark, and more.

Updated the query logic to handle quoted and unquoted terms separately using regex for better granularity and consistency.

Added tests for these changes.

Resolves #1632. We should proceed with #435 as part of a more robust fix for these types of issues.

r'"(.*?)"', general_str
) # Extract terms in quotes
unquoted_terms = re.findall(
r"\b[\w,-.]+\b", re.sub(r'"(.*?)"', "", general_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also do this with one regex with look-aheads/look-behinds:

(?<!\")\b[\w,-.]+\b(?!\")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably clearer what's happening here with the way you have it now...

# Add quoted terms to the Q object with exact matching (iexact)
for term in quoted_terms:
holding_institution_q |= Q(
holding_institution__name__unaccent__iexact=term
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think iexact is correct here...

If someone searches for "North Vancouver", don't I want any results where "North Vancouver" shows up somewhere in one of these fields, not just results where a field is exactly "North Vancouver"?

I think contains is probably still the method, it's just we're not searching single words, but sometimes groups of words depending on the quotes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source search could be better
2 participants