ragnar_store_create()
gains a new argument:
version
, with default 2
. Store version 2 adds
support for chunk deoverlapping on retrieval and automatic chunk
augmentation with headings. To support these features, the internal
schema and ingestion requirements are different. See
markdown_chunk()
and new S7 classes
MarkdownDocument
and MarkdownDocumentChunks
.
Backwards compatibility is maintained with version = 1. (#58, #39,
#36)
ragnar_store_create()
now supports Date and POSIXct
classes supplied to extra_cols
.
ragnar_store_create()
now supports remote MotherDuck
Databases specified with md:<dbname>
as the
location
argument. (#50)
ragnar_retrieve()
and friends gain a
filter
argument, adding support for efficiently filtering
retrieval results.
ragnar_retrieve_bm25()
gains arguments
b
, k
, and conjunctive
(#56).
ragnar_retrieve_vss()
gains argument
query_vector
, supporting workflows that preprocess the
query string before embedding.
ragnar_retrieve_vss()
set of valid
method
choices have been updated to a narrower set to
ensure that an HNSW
index scan is used.
Passing a tbl(store)
to
ragnar_retrieve()
is deprecated.
New chunker markdown_chunk()
with support for chunk
heading context generation, semantic boundary selection, overlapping
chunks, document segmentation, and more. (#56)
New function embed_google_vertex()
(@dfalbel, #49)
New function embed_databricks()
(@atheriel, #45)
New function ragnar_chunks_view()
for quickly
previewing chunks (#42)
ragnar_register_tool_retrieve()
gains optional
name
and title
arguments to allow for more
descriptive tool registration. These values can also be set in
ragnar_store_create()
(#43).
ragnar_read()
and read_as_markdown()
now accept paths that begin with ~
(@topepo, #46, #48).
Changes to read_as_markdown()
HTML conversion (#40,
#51):
html_extract_selectors
and
html_zap_selectors
provide a flexible way to exclude some
html page elements from being included in the converted markdown.