~l3kn/org-fc#28: 
Performance Issues

When I restrict my org-fc-directories to 3 files with around 150k total lines split equally, org-fc-awk-index-paths takes around 1.6 seconds to complete. I'm running a Thinkpad 460s with SSD, so I expect it to be slower than your example in the docs; However, this seems to be a bit exaggerated.

File 1 File 2 File 3

Status
REPORTED
Submitter
~kindablue
Assigned to
No-one
Submitted
2 years ago
Updated
2 years ago
Labels
No labels applied.

Leon Rische 2 years ago · edit

I think the benchmarks were using an older version of the awk indexer, I knew it doing more parsing made it slower but I didn't expect that big of a difference.

Indexing the directory mentioned in the docs still takes ~600ms, for your set of files it takes around 1.5s. I'm not sure it's so much slower for your files, maybe because the density of flashcards is higher?

There might be some room for optimization in index.awk, beyond that we'd need to write an optimized indexer in a faster language.

Caching would also help but if you are using few files with lots of flashcards, most of them will be invalidated during each review.

How long does it take set up a card in one of those large files for review? I've had to disable some of the org fontification to make reviewing large files reasonably fast.

Btw, did you automatically create those flashcards?

I've written some code to convert the "Petit Robert" french dictionary into a format usable with org-fc but I've not found a way to get audio files for the cards yet.

In case you didn't know, if the "front" of the card is in the heading, you don't need a "Back" heading, the text under the main heading will be used as the back side of the card.

On 11/11/20 4:03 PM, ~kindablue wrote:

When I restrict my org-fc-directories to 3 files with around 150k total lines split equally, org-fc-awk-index-paths takes around 1.6 seconds to complete. I'm running a Thinkpad 460s with SSD, so I expect it to be slower than your example in the docs; However, this seems to be a bit exaggerated.

File 1 File 2 File 3

~kindablue 2 years ago

Leon Rische outgoing@sr.ht writes:

Indexing the directory mentioned in the docs still takes ~600ms, for your set of files it takes around 1.5s.

I'm glad it wasn't just me but that is unfortunate.

There might be some room for optimization in index.awk, beyond that we'd need to write an optimized indexer in a faster language.

Caching would also help but if you are using few files with lots of flashcards, most of them will be invalidated during each review.

Is the better strategy to have multiple smaller files? Is there any possibility to cache headlines instead of files?

How long does it take set up a card in one of those large files for review? I've had to disable some of the org fontification to make reviewing large files reasonably fast.

Reviewing the cards isn't terrible actually. The initial setup is a slog, but flipping through cards is okay.

Btw, did you automatically create those flashcards? I've written some code to convert the "Petit Robert" french dictionary into a format usable with org-fc but I've not found a way to get audio files for the cards yet.

They are decks from Anki. There's a plugin (https://github.com/Stvad/CrowdAnki) that exports the media into a folder as well as a json output that I initially tried to parse to create the files. However that proved to be too complicated. So I ended up using a python script (https://0x0.st/inwf.py) and importing the 'Notes in Plain Text'. It's not too bad.

In case you didn't know, if the "front" of the card is in the heading, you don't need a "Back" heading, the text under the main heading will be used as the back side of the card.

Ah, I think the "back" heading was a leftover from a previous import. Thanks for the heads up.

-- Trey Peacock

~l3kn 2 years ago

I don't think there is a good way to do caching on a headline level, for files it's easier because we can quickly compute a checksum of each file to see if it has changed.

When caching at a file level, multiple small files would be faster because each review invalidates only a subset of them.

I've found that a lot of buffer setup time is due to org-mode's latex highlighting, as a workaround I have this rather blunt overwrite in my config:

  (defun org-do-latex-and-related (_limit)
    "Don't highlinght any LaTeX."
    nil)
Register here or Log in to comment, or comment via email.