We're currently working on exposing hyperbee access in ~hypercore-fetch-ushin~. This will allow us to store:
Current progress is being made in the hypercore-fetch-ushin kvdb branch.
Supercedes #32.
See
org-roam-db--table-schemata
, (source), which defines some indices used inorg-roam
.
See John Kitchin's 2017 SQL schema for indexing Org files. He indexes the following Org metadata: filenames, headlines, tags, properties, (optionally) headline-content, headline-tags, headline-properties, and links. (source)
His example of querying for
cite:
links gives me an idea of how we could use a regexp filter operation when sending queries. For example, to find "headings tagged:emacs:
which cite any work written by John Kitchin", we could send a hyperbee operation like this:['query', { filter: 'cite:.*john.*kitchin', gte: ['tag', 'emacs', '\x00'], lte: ['tag', 'emacs', '\xff'] }]
org-db.el
now lives in scimax, and it has grown to include full text search and image search.
We could maybe avoid tokenizing and indexing heading content with the
filter
operation also. We could transform a query like "tags:emacs deliberation p2p" into:['query', { filter: '(emacs.*deliberation)|(deliberation.*emacs)', // Perhaps there's a cleaner way to write this regexp gte: ['tag', 'emacs', '\x00'], lte: ['tag', 'emacs', '\xff'] }]
The idea is to take the portion of the query which is intended to match against heading text and convert it into a regular expression which can be passed to the
filter
operation. The effect would be similar to theorderless
completion style: break the string along whitespace and generate a regex which matches each of those parts in any order.
Could we reuse
org-ql
's non-sexp query syntax, but handle certain parts of the query in a special way with the hyperbee api?When finally displaying the org file content, we always load the file in its entirety (what would it even look like to display a partially-loaded file in an Emacs buffer?). Let's consider having the value of each index entry be a file and not ever a heading. A complex query like
tags:emacs,p2p !blockchain ts:on=today
would first query the hyperbee instance for all of the files which contain both anemacs
andp2p
tag, load them into Emacs, then useorg-ql-view-sidebar
to display the search results.If we index just a few key metadata types to filter out the bulk of the irrelevant hyperdrive files, we can use
org-ql
to do the rest of the work.The question remains: what metadata should we index?
It appears that
org-ql
does not have a syntax for querying bymtime
, i.e., to search for results only within files which were modified before/after/at a certain time. We may want to perform searches like this inhyperdrive.el
("Show me headings tagged:ushin:
which have been modified in the last month"). How best to handle this withorg-ql
?