| Title: | 'Rcpp' Bindings for 'hnswlib', a Library for Approximate Nearest Neighbors |
|---|---|
| Description: | 'Hnswlib' is a C++ library for Approximate Nearest Neighbors. This package provides a minimal R interface by relying on the 'Rcpp' package. See <https://github.com/nmslib/hnswlib> for more on 'hnswlib'. 'hnswlib' is released under Version 2.0 of the Apache License. |
| Authors: | James Melville [aut, cre, cph], Aaron Lun [ctb], Samuel Granjeaud [ctb], Dmitriy Selivanov [ctb], Yuxing Liao [ctb] |
| Maintainer: | James Melville <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.7.0.9000 |
| Built: | 2026-05-26 14:58:42 UTC |
| Source: | https://github.com/jlmelville/rcpphnsw |
hnswlib is a library implementing the Hierarchical Navigable Small World method for approximate nearest neighbor search.
Details about hnswlib are available at the reference listed below.
James Melville for the R interface; Yury Malkov for hnswlib itself.
Maintainer: James Melville [email protected]
https://github.com/nmslib/hnswlib
Malkov, Y. A., & Yashunin, D. A. (2016). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv preprint arXiv:1603.09320.
Useful links:
Build an hnswlib nearest neighbor index
hnsw_build( X, distance = "euclidean", M = 16, ef = 200, verbose = FALSE, progress = "bar", n_threads = 0, grain_size = 1, byrow = TRUE, random_seed = 100 )hnsw_build( X, distance = "euclidean", M = 16, ef = 200, verbose = FALSE, progress = "bar", n_threads = 0, grain_size = 1, byrow = TRUE, random_seed = 100 )
X |
A numeric matrix of data to search for neighbors. If |
distance |
Type of distance to calculate. One of:
|
M |
Controls the number of bi-directional links created for each element
during index construction. Higher values lead to better results at the
expense of memory consumption. Typical values are |
ef |
Size of the dynamic list used during construction. A larger value means a better quality index, but increases build time. Should be an integer value between 1 and the size of the dataset. |
verbose |
If |
progress |
defunct and has no effect. |
n_threads |
Maximum number of threads to use. The exact number is
determined by |
grain_size |
Minimum amount of work to do (rows in |
byrow |
if |
random_seed |
Seed passed to hnswlib for index construction. The
default, |
an instance of an HnswEuclidean, HnswL2, HnswCosine or
HnswIp class.
irism <- as.matrix(iris[, -5]) ann <- hnsw_build(irism) iris_nn <- hnsw_search(irism, ann, k = 5)irism <- as.matrix(iris[, -5]) ann <- hnsw_build(irism) iris_nn <- hnsw_search(irism, ann, k = 5)
A k-nearest neighbor algorithm using the hnswlib library (https://github.com/nmslib/hnswlib).
hnsw_knn( X, k = 10, distance = "euclidean", M = 16, ef_construction = 200, ef = 10, verbose = FALSE, progress = "bar", n_threads = 0, grain_size = 1, byrow = TRUE, random_seed = 100 )hnsw_knn( X, k = 10, distance = "euclidean", M = 16, ef_construction = 200, ef = 10, verbose = FALSE, progress = "bar", n_threads = 0, grain_size = 1, byrow = TRUE, random_seed = 100 )
X |
A numeric matrix of |
k |
Number of neighbors to return. |
distance |
Type of distance to calculate. One of:
|
M |
Controls the number of bi-directional links created for each element
during index construction. Higher values lead to better results at the
expense of memory consumption. Typical values are |
ef_construction |
Size of the dynamic list used during construction. A larger value means a better quality index, but increases build time. Should be an integer value between 1 and the size of the dataset. |
ef |
Size of the dynamic list used during search. Higher values lead
to improved recall at the expense of longer search time. Can take values
between |
verbose |
If |
progress |
defunct and has no effect. |
n_threads |
Maximum number of threads to use. The exact number is
determined by |
grain_size |
Minimum amount of work to do (rows in |
byrow |
if |
random_seed |
Seed passed to hnswlib for index construction. The
default, |
a list containing:
idx a matrix containing the nearest neighbor indices.
dist a matrix containing the nearest neighbor distances.
The dimensions of the matrices respect the storage (row or column-based) of
X as indicated by the byrow parameter. If byrow = TRUE (the default)
each row of idx and dist contain the neighbor information for the item
passed in the equivalent row of X, i.e. the dimensions are n x k where
n is the number of items in X. If byrow = FALSE, then each column of
idx and dist contain the neighbor information for the item passed in
the equivalent column of X, i.e. the dimensions are k x n.
Every item in the dataset is considered to be a neighbor of itself, so the
first neighbor of item i should always be i itself. If that isn't the
case, then any of M, ef_construction or ef may need increasing.
Some details on the parameters used for index construction and search, based on https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md:
M Controls the number of bi-directional links created for each
element during index construction. Higher values lead to better results at
the expense of memory consumption, which is around M * 8-10 bytes
per bytes per stored element. High intrinsic dimensionalities will require
higher values of M. A range of 2 - 100 is typical, but
12 - 48 is ok for most use cases.
ef_construction Size of the dynamic list used during
construction. A larger value means a better quality index, but increases
build time. Should be an integer value between 1 and the size of the
dataset. A typical range is 100 - 2000. Beyond a certain point,
increasing ef_construction has no effect. A sufficient value of
ef_construction can be determined by searching with ef = ef_construction, and ensuring that the recall is at least 0.9.
ef Size of the dynamic list used during index search. Can
differ from ef_construction and be any value between k (the
number of neighbors sought) and the number of elements in the index being
searched.
Malkov, Y. A., & Yashunin, D. A. (2016). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv preprint arXiv:1603.09320.
iris_nn_data <- hnsw_knn(as.matrix(iris[, -5]), k = 10)iris_nn_data <- hnsw_knn(as.matrix(iris[, -5]), k = 10)
Search an hnswlib nearest neighbor index
hnsw_search( X, ann, k, ef = 10, verbose = FALSE, progress = "bar", n_threads = 0, grain_size = 1, byrow = TRUE )hnsw_search( X, ann, k, ef = 10, verbose = FALSE, progress = "bar", n_threads = 0, grain_size = 1, byrow = TRUE )
X |
A numeric matrix of data to search for neighbors. If |
ann |
an instance of an |
k |
Number of neighbors to return. This can't be larger than the number
of items that were added to the index |
ef |
Size of the dynamic list used during search. Higher values lead
to improved recall at the expense of longer search time. Can take values
between |
verbose |
If |
progress |
defunct and has no effect. |
n_threads |
Maximum number of threads to use. The exact number is
determined by |
grain_size |
Minimum amount of work to do (items in |
byrow |
if |
a list containing:
idx a matrix containing the nearest neighbor indices.
dist a matrix containing the nearest neighbor distances.
The dimensions of the matrices respect the storage (row or column-based) of
X as indicated by the byrow parameter. If byrow = TRUE (the default)
each row of idx and dist contain the neighbor information for the item
passed in the equivalent row of X, i.e. the dimensions are n x k where
n is the number of items in X. If byrow = FALSE, then each column of
idx and dist contain the neighbor information for the item passed in
the equivalent column of X, i.e. the dimensions are k x n.
Every item in the dataset is considered to be a neighbor of itself, so the
first neighbor of item i should always be i itself. If that isn't the
case, then any of M or ef may need increasing.
irism <- as.matrix(iris[, -5]) ann <- hnsw_build(irism) iris_nn <- hnsw_search(irism, ann, k = 5)irism <- as.matrix(iris[, -5]) ann <- hnsw_build(irism) iris_nn <- hnsw_search(irism, ann, k = 5)