~ushin/ushin#190: 
Hyperbee access

We're currently working on exposing hyperbee access in ~hypercore-fetch-ushin~. This will allow us to store:

  • index hyperdrive content for fast search (primary use case)
  • store user metadata (instead of storing human-memorable nickname inside a well-known location in the drive, put it in a hyperbee sub)
  • store other arbitrary key/value data, i.e., TrustNet network data

Current progress is being made in the hypercore-fetch-ushin kvdb branch.

Supercedes #32.

Status
REPORTED
Submitter
~ushin
Assigned to
No-one
Submitted
8 months ago
Updated
7 months ago
Labels
hypercore-fetch-ushin

~ushin referenced this from #32 8 months ago

~ushin 8 months ago

See org-roam-db--table-schemata, (source), which defines some indices used in org-roam.

~ushin 8 months ago*

See John Kitchin's 2017 SQL schema for indexing Org files. He indexes the following Org metadata: filenames, headlines, tags, properties, (optionally) headline-content, headline-tags, headline-properties, and links. (source)

His example of querying for cite: links gives me an idea of how we could use a regexp filter operation when sending queries. For example, to find "headings tagged :emacs: which cite any work written by John Kitchin", we could send a hyperbee operation like this:

['query', {
  filter: 'cite:.*john.*kitchin',
  gte: ['tag', 'emacs', '\x00'],
  lte: ['tag', 'emacs', '\xff']
}]

org-db.el now lives in scimax, and it has grown to include full text search and image search.

~ushin 8 months ago*

We could maybe avoid tokenizing and indexing heading content with the filter operation also. We could transform a query like "tags:emacs deliberation p2p" into:

['query', {
  filter: '(emacs.*deliberation)|(deliberation.*emacs)', // Perhaps there's a cleaner way to write this regexp
  gte: ['tag', 'emacs', '\x00'],
  lte: ['tag', 'emacs', '\xff']
}]

The idea is to take the portion of the query which is intended to match against heading text and convert it into a regular expression which can be passed to the filter operation. The effect would be similar to the orderless completion style: break the string along whitespace and generate a regex which matches each of those parts in any order.

~ushin 8 months ago*

Could we reuse org-ql's non-sexp query syntax, but handle certain parts of the query in a special way with the hyperbee api?

When finally displaying the org file content, we always load the file in its entirety (what would it even look like to display a partially-loaded file in an Emacs buffer?). Let's consider having the value of each index entry be a file and not ever a heading. A complex query like tags:emacs,p2p !blockchain ts:on=today would first query the hyperbee instance for all of the files which contain both an emacs and p2p tag, load them into Emacs, then use org-ql-view-sidebar to display the search results.

If we index just a few key metadata types to filter out the bulk of the irrelevant hyperdrive files, we can use org-ql to do the rest of the work.

The question remains: what metadata should we index?

~ushin 7 months ago

It appears that org-ql does not have a syntax for querying by mtime, i.e., to search for results only within files which were modified before/after/at a certain time. We may want to perform searches like this in hyperdrive.el ("Show me headings tagged :ushin: which have been modified in the last month"). How best to handle this with org-ql?

Register here or Log in to comment, or comment via email.