HnswEuclidean
. This uses Euclidean distances internally and will
be returned from hnsw_build
when distance = "euclidean"
is specified. This
fixes an issue where if you created an index with hnsw_build
and
distance = "euclidean"
(the default), then after saving, you would be unable
to reload the index and have it find Euclidean distances. You would have to
create it as an HsnwL2
object and take the square root of the distances
yourself (https://github.com/jlmelville/rcpphnsw/issues/21).Hnsw
constructors now expose a random_seed
parameter that you can use
to set the random seed used in constructing the HNSW index. Internally, the
hnsw_build
and hnsw_knn
functions will use a random seed based on R's RNG
state. This means that if you want to reproduce results, you need to set the
random seed in R via set.seed
before calling those functions. Based on a
request by Maciej Beręsewicz
(https://github.com/jlmelville/rcpphnsw/issues/23).addItemsCol
, getAllNNsCol
and getAllNNsListCol
are the
column-based equivalents of addItems
, getAllNNs
and getAllNNsList
, respectively. Note that
the returned nearest neighbor data from getAllNNsCol
and getAllNNsListCol
are also stored
by column, i.e. the matrices have dimensions k x n
where k
is the number of neighbors, and
n
the number of items in the data being searched.byrow
has been added to hnsw_knn
, hnsw_build
and hnsw_search
. By default this is set to TRUE
and indicates that the items in the input
matrix are found in each row. To pass column-stored items, set byrow = FALSE
. Any matrices
returned by hnsw_search
and hnsw_knn
will now follow the convention provided by the value of
byrow
: i.e. if byrow = FALSE
, the matrices contain nearest neighbor information in each
column.getItems
, which returns a matrix of the data vectors in the index with the
specified integer identifiers. From a feature request made by d4tum
(https://github.com/jlmelville/rcpphnsw/issues/18).progress
parameter in the functional interface no longer does anything. When
verbose = TRUE
, a progress bar is no longer shown.setNumThreads
method if
using the object-based API, and the n_threads
parameter in the hnsw_*
function API. For finer control, a setGrainSize
and grain_size
option is
also available in the object and function interface respectively. Thank you
to Dmitriy Selivanov for a lot of the work on
this.verbose = TRUE
now has incurs substantially less computational
overhead associated with calculating the progress bar. Thank you to
Samuel Granjeaud for spotting the problem and coming
up with various solutions.progress
. By default this is set to "bar"
and will show the
progress bar when verbose = TRUE
. If you want a more terse output, set
progress = NULL
. progress = NULL
will eventually be the default setting:
for now, verbose = TRUE
will get you the progress bar by default for backwards
compatibility.markDeleted
, that will remove an object from being retrieved
from the index.resizeIndex
, that allows the index to be increased without
having to save and reload the index.size
is available for the index objects and reports the
number of items added to the index.hnsw_search
would stop
if the number of rows in the input matrix was
smaller than k
. This check has been removed. Note that the correct behavior is
to ensure that k
is smaller than or equal to index$size()
where index
is
the index you are searching. Because the size()
method is new to this version,
to preserve compatibility with old indexes, this check hasn't been added to
hnsw_search
. If this matters to you, manually compare index$size()
with k
before running hnsw_search
. An error will be thrown if k
neighbors can't be
found in the index. Thank you to Yuxing Liao for
spotting this and the pull request to remove the check.Initial release.