- Added addArtifacts REST endpoint to allow bulk import of artifacts from (WARC) archives.
- Significantly improved temporary WARC garbage collection performance, which allows it to be performed far more frequently.
- Support for "bulk storage mode" for migration from the classic LOCKSS daemon (1.76).
- Improvements to
SolrArtifactIndex
(and clients of it) to be more judicious about when and where Solr commits are performed and number of queries performed. - Numerous internal improvements and bug fixes. E.g., removed the use of
DeferredTempFileOputStream
in the serialization of artifacts to WARC records inWarcArtifactDataStore
. - Support for filesystems larger than 8 EiB.
- Upgraded Hadoop client libraries to version 3.3.1
- Upgraded Solr client libraries and configuration to version 8.9.0. Merged LOCKSS-specific configuration changes into Solr 8.9.0 configuration.
BaseLockssRepository
implementations now have a repository state directory. Its subsystems (e.g., the data store) may organize their state into subdirectories of the repository state directory. It is currently used for artifact reindex signaling and state.- Reindex trigger and logic was implemented in
BaseLockssRepository
. The reindex is triggered by the presence of its state file (.../state/index/reindex
). After a successful reindex, the reindex state file is renamed out of the way. The reindex state file is a CSV log of WARCs that were reindexed. It may be useful for debugging or auditing the reindex. WarcArtifactDataStore
and its subclasses were refactored so that they channel their access to subsystems (e.g., index) through their BaseLockssRepository context.- Introduced
LockssRepositorySubsystem
interface for subsystems (currently index and data store) ofBaseLockssRepository
. - Numerous performances improvements to
SolrArtifactIndex
through control over when Solr performs soft and hard commits. IntroducedSolrCommitJournal
to record Solr updates and replay them if necessary.
- Re-enabled the resumption and processing of artifacts from temporary WARCs upon
WarcArtifactDataStore
startup. - Adjusted (and removed unnecessary) synchronization through the Repository service to improve performance and avoid a deadlock.
- Switched to a 3-part version numbering scheme.
- Added support for uncompressed and GZIP compressed WARCs
- Added the ability to supply and use Solr credentials in SolrJ calls made by the RestLockssRepository client.
- Improved error handling and forwarding in data store, repository, and client layers.
- Major performance and scaling improvements in WARC data store operations that previously relied on the artifact state journal.
- Refactored and simplified artifact lifecycle within WARC data store implementations.
- Fixed several race conditions e.g., through the introduction of SemaphoreMap and artifact version locking support in artifact index implementations.
- Numerous other small bug fixes and improvements.
- ...
- ...
- Move reference Solr version from 6.6.5 to 7.2.1.
- Infrastructural work to support upgrading Solr config sets.
Artifact
andArtifactData
caching improves performance.- Paginating iterators improve performance.
- Fix incorrect URL enumeration order in
SolrArtifactIndex
. - Clean up large deferred temporary files after use.
- Remove file size limit.
- Increase URL length limit.
- Allow fetching uncommitted artifacts.
- Optionally include uncommitted artifacts in iterators.
- Bug fixes and improved unit tests.