The library implement fast (highly optimized) vector algebra for Golang.
Hierarchical Navigable Small World (HNSW) algorithms are state-of-the-art techniques employed for approximate nearest neighbor searches. Vectors and the concept of distance are essential components of the algorithm. The algorithm requires large quantity of vector comparisons on the write path.
The specific count of vector operations performed by HNSW can vary, contingent on factors such as implementation details and the chosen parameters for the algorithm. We have observed a logarithmic progression, starting from 2.2K distance computations per write operation, with subsequent growth reaching 8K and beyond.
In handling such extensive data, the significance of nanoseconds cannot be overstated!
The latest version of the library is available at main
branch of this repository. All development, including new features and bug fixes, take place on the main
branch using forking and pull requests as described in contribution guidelines. The stable version is available via Golang modules.
import "github.com/kshard/vector"
// Our vectors
a := []float32{60.1699, 24.9384, 24.9384, 60.1699}
b := []float32{59.9311, 30.3609, 30.3609, 59.9311}
// instantiate Euclidean distance function
euclidean := vector.Euclidean()
euclidean.Distance(a, b) // 58.921078
// instantiate Cosine distance function
cosine := vector.Cosine()
cosine.Distance(a, b) // 0.001443
Note: One known limitation, the library requires vectors aligned to 4.
The library provides two versions of vector algebra functions. The library automatically selects appropriate version depending on the architecture.
- A Golang native version optimized for instruction pipelining.
- A pure assembly implementation leveraging the SIMD ("Single Instruction Multiple Data") instruction set.
Internally each version is implemented by the package
- internal/noasm - optimized Golang version.
- internal/simd - pure assembly implementation.
- internal/pure - reference implementation using idiomatic Golang.
The library provides Info
function to determine configuration runtime
config = vector.Info()
config[vector.CONFIG_XXX]
// XXX_WITH_PURE
// XXX_WITH_NOASM
// XXX_WITH_SIMD
Instruction pipeline is 3.3x faster and SIMD is 3.8x faster (due to bounds check elimination)
go test -run=^$ -bench=. -cpu=1
BenchmarkPureEqualF32 12029072 99.30 ns/op
BenchmarkNoAsmEqualF32 13555147 87.51 ns/op
BenchmarkPureEuclideanF32 4425328 269.50 ns/op
BenchmarkNoAsmEuclideanF32 14897370 79.70 ns/op
BenchmarkSIMDEuclideanF32 16708311 70.73 ns/op
BenchmarkContraMapEuclidean 16567862 71.07 ns/op
BenchmarkPureCosineF32 3995820 299.70 ns/op
BenchmarkNoAsmCosineF32 11836580 100.80 ns/op
ContraMap turn morphisms around f: B ⟼ A
.
type Node struct {
ID int
Vector vector.F32
}
vector.ContraMap[vector.F32, Node]{
Surface: vector.Euclidean(),
ContraMap: func(n Node) []float32 { return n.Vector },
}
The library is MIT licensed and accepts contributions via GitHub pull requests:
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
The build and testing process requires Go version 1.13 or later.
build and test library.
git clone https://github.com/kshard/vector
cd vector
go test
Checklist:
- No new bounds check introduced:
go build -gcflags="-d=ssa/check_bce"
- Check inlining:
go build -gcflags "-m"
- No performance degradations:
go test -run=^$ -bench=. -cpu=1
The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter Contributing to a Project of Git book.
If you experience any issues with the library, please let us know via GitHub issues. We appreciate detailed and accurate reports that help us to identity and replicate the issue.