Rewriten core search routines from the old C to Python, move word index to LMDB
Count all co-occurrences (old behavior only counted one co-occurrence per sentence)
Integrate SpaCy in parser for part-of-speech tagging, lemmatization, and ner.
Lemma searching in all reports enabled if present in TEI or if lemma mapping file provided
Word attribute searching
Completely rewritten collocations with better performance (1.5x) and a LOT of new functionality:
- Search within n words in collocations
- Search for lemmas
- Filter by word attribute (e.g. by part-of-speech)
- Compare collocation distributions based on count and z-score difference.
- Find most similar collocation distribution based on metadata (e.g. finding authors with most similar collocation distribution)
- Collocations over time
Speed-up time series (5x or more)

New aggregation report
New metadata stats in search results
Results bibliography in concordance and KWIC results.
Database size should be between 50% and 80% (or more) smaller
Significant speed-ups for:
- Collocations: in some cases 3-4X
- Sorted KWICs: between 6X and 25X (or more) depending on use case, with no more limits on the size of the sort as a result.
- Faceted browsing (frequencies): anywhere from 3X to 100X (or more)
- Landing page browsing: 10X faster or more on large corpora
Export results to CSV
Web config has been simplified with the use of global variables for citations
Some breaking changes to web config: you should not use a 4.6 config
Revamped Web UI: move to VueJS and Bootstrap 5.
Cleaner URLS for queries
Faster database loads
New generic dictionary lookup code
Support for date and integer types for metadata fields.

Port PhiloLogic4 codebase to Python3
Switch load time compression from Gzip to LZ4: big speed-up in loading large databases
Lib reorganization

Completely rewritten parser: can now parse broken XML
Massive lib reorg
A new system wide config
Loading process completely revamped: use philoload4 command
Completely rewritten collocations: faster and accurate
Added relative frequencies to frequencies in facets
Added sorted KWIC
Added support for regexes in quoted term searches (aka exact matches)
Added ability to filter out words in query expansion through a popup using the NOT syntax
Added configurable citations for all reports
Added concordance results sorting by metadata
Added approximate word searches using Levenshtein distance
Redesign facets and time series
Bug fixes and optimizations everywhere...

Provide feedback

Saved searches