HnswEuclidean. This uses Euclidean distances internally and will
be returned from hnsw_build when distance = "euclidean" is specified. This
fixes an issue where if you created an index with hnsw_build and
distance = "euclidean" (the default), then after saving, you would be unable
to reload the index and have it find Euclidean distances. You would have to
create it as an HsnwL2 object and take the square root of the distances
yourself (https://github.com/jlmelville/rcpphnsw/issues/21).Hnsw constructors and the hnsw_build and hnsw_knn functions now
expose a random_seed parameter that you can use to set the random seed used
in constructing the HNSW index. If not provided, the the hnswlib default of
100 is used (and should preserve previous behavior). Based on a request by
Maciej Beręsewicz
(https://github.com/jlmelville/rcpphnsw/issues/23).addItemsCol, getAllNNsCol and getAllNNsListCol are the
column-based equivalents of addItems, getAllNNs and getAllNNsList, respectively. Note that
the returned nearest neighbor data from getAllNNsCol and getAllNNsListCol are also stored
by column, i.e. the matrices have dimensions k x n where k is the number of neighbors, and
n the number of items in the data being searched.byrow has been added to hnsw_knn, hnsw_build
and hnsw_search. By default this is set to TRUE and indicates that the items in the input
matrix are found in each row. To pass column-stored items, set byrow = FALSE. Any matrices
returned by hnsw_search and hnsw_knn will now follow the convention provided by the value of
byrow: i.e. if byrow = FALSE, the matrices contain nearest neighbor information in each
column.getItems, which returns a matrix of the data vectors in the index with the
specified integer identifiers. From a feature request made by d4tum
(https://github.com/jlmelville/rcpphnsw/issues/18).progress parameter in the functional interface no longer does anything. When
verbose = TRUE, a progress bar is no longer shown.setNumThreads method if
using the object-based API, and the n_threads parameter in the hnsw_*
function API. For finer control, a setGrainSize and grain_size option is
also available in the object and function interface respectively. Thank you
to Dmitriy Selivanov for a lot of the work on
this.verbose = TRUE now has incurs substantially less computational
overhead associated with calculating the progress bar. Thank you to
Samuel Granjeaud for spotting the problem and coming
up with various solutions.progress. By default this is set to "bar" and will show the
progress bar when verbose = TRUE. If you want a more terse output, set
progress = NULL. progress = NULL will eventually be the default setting:
for now, verbose = TRUE will get you the progress bar by default for backwards
compatibility.markDeleted, that will remove an object from being retrieved
from the index.resizeIndex, that allows the index to be increased without
having to save and reload the index.size is available for the index objects and reports the
number of items added to the index.hnsw_search would stop if the number of rows in the input matrix was
smaller than k. This check has been removed. Note that the correct behavior is
to ensure that k is smaller than or equal to index$size() where index is
the index you are searching. Because the size() method is new to this version,
to preserve compatibility with old indexes, this check hasn't been added to
hnsw_search. If this matters to you, manually compare index$size() with k
before running hnsw_search. An error will be thrown if k neighbors can't be
found in the index. Thank you to Yuxing Liao for
spotting this and the pull request to remove the check.Initial release.