~cannam

Trackers

~cannam/bisquay

Last active a month ago

~cannam/repoint

Last active 3 months ago

~cannam/easyhg

Last active 1 year, 4 months ago

#1 Performance test updates a month ago

Comment by ~cannam on ~cannam/bisquay

Another attempt at using flat vectors for components. This is with the component represented using three reals (value, intensity, and offset), rather than two reals and a RealTime (value, intensity, and time). We can do this now because the component summariser now knows the hop rate, so can deal with offsets from hop time rather than needing absolute times.

Note that this has the advantage of faster streaming for component matrices, something that is sadly not measured in these tests but should make for a significant improvement in the UI. So it may be acceptable for this to be a little slower in these tests, however disappointing that might be.

ae305d318fe5 (time-offset-again) tip
Commit date: Mon Dec 20 14:35:18 2021 +0000

                        polyml          mlton_noffi     mlton_release   mlton_ipp       mem/polyml      mem/release
   waveform             0:00.70         0:00.46         0:00.36         0:00.35         242M            162M
   waveform-max         0:04.57         0:03.29         0:02.73         0:02.74         452M            248M
   cqt-dump             0:11.41         0:09.07         0:03.29         0:03.49         996M            215M
   spec-dump            0:01.70         0:02.21         0:01.19         0:02.15         203M            90M
   spec-max             0:07.04         0:03.92         0:03.56         0:03.50         957M            707M
   reassigning          1:39.14         1:01.09         0:27.91         0:27.76         2290M           3382M

The above is using an FFI call to sort component vectors, something which looked as if it may be advantageous enough to make them faster overall (but apparently not). Without that - using just the flat vector with the same sorting calls as earlier - we have


   reassigning          1:38.06         1:01.95         0:33.65         0:33.82         2452M           3529M

#1 Performance test updates a month ago

Comment by ~cannam on ~cannam/bisquay

After introducing the ability to fill from the middle of the columnar random-access, in order to improve responsiveness when scrolling and zooming a slow-to-generate spectrogram. This is the work of the seekable and read-budget branches, having recently been merged back to default.

e79e7d419a36 tip
Commit date: Wed Dec 15 08:25:28 2021 +0000

                        polyml          mlton_noffi     mlton_release   mlton_ipp       mem/polyml      mem/release
   waveform             0:00.72         0:00.50         0:00.38         0:00.37         241M            154M
   waveform-max         0:04.39         0:03.37         0:02.90         0:02.80         449M            230M
   cqt-dump             0:11.18         0:09.48         0:03.59         0:03.39         957M            209M
   spec-dump            0:01.73         0:02.32         0:01.19         0:01.22         203M            122M
   spec-max             0:06.39         0:04.01         0:03.43         0:03.43         999M            611M
   reassigning          1:23.62         0:57.21         0:24.40         0:23.12         1229M           3313M

That spec-max is slower is a pity, but not surprising given that we now store columnar data with an option for each column, so straight reads of pre-filled data are inevitably slower. I think this is a reasonable tradeoff, though it's also possible that I've missed something and much of the overhead is actually from somewhere else.

Note that (with reference to the previous comment above) I did retain the use of flat vectors for ComplexVector throughout, which led to a small slowdown in the Poly/ML CQT but made little difference elsewhere. Flat vectors were already explicitly bodged in for the FFI-compatible ComplexVector, so that didn't do a lot to release builds.

#1 Performance test updates 2 months ago

Comment by ~cannam on ~cannam/bisquay

Some experiments with flat vectors (the flat-everywhere branch).

Here's introducing a general flat vector of 2- or 3- element values and using it for complex vector, array, & matrix throughout (including in pure-SML builds).

ca5ac2beefab (flat-everything) tip
Commit date: Wed Oct 27 11:28:31 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.70         0:00.47         0:00.38         241M            110M
   waveform-max         0:04.61         0:03.45         0:02.87         449M            247M
   cqt-dump             0:12.12         0:09.68         0:03.47         988M            216M
   spec-dump            0:01.70         0:02.32         0:01.18         203M            70M
   spec-max             0:06.96         0:03.07         0:02.56         930M            592M
   reassigning          1:26.83         0:59.79         0:28.97         1134M           3682M

This flat vector could be a little slower than the previous custom FlatComplexVector because it isn't so tightly specialised for 2-element values. We might possibly just be better off essentially renaming the old complex vector to Flat2Vector for any 2-element value and add Flat3Vector if we ever actually do use 3-element flat vectors in future.

Part of the motivation was to use flat vectors for component vectors as well. I made two attempts at that:

  1. Using Word64 as the storage type and trying to convert components to it via PackReal64Little and friends (+ a new RealTime.pack). I haven't yet found a way to do this conversion at zero cost - the Basis library mechanisms go through arrays themselves that are apparently not optimised out - so this took 3:27 to run the release reassigning test (the one that is under 30 seconds above). And Poly/ML doesn't support enough Pack structures to build this.

  2. Using Real as the storage type and converting using RealTime.toReal. This can never be "correct", but I wanted to compare timings. It took 2:53 for the same release reassigning test.

Unless there is some cost-free way to pack reals and FixedInt.int to words, I don't think this can go anywhere. And even MLton's PACK_REAL_EXTRA doesn't seem to have one. An alternative would be to have the flat vector functor argument provide functions that pack directly into a byte array, rather than convert to a rep type. I don't entirely fancy writing that.

#1 Performance test updates 2 months ago

Comment by ~cannam on ~cannam/bisquay

After starting to merge similar components when summarising frequency component streams:

6ae068c16b8e tip
Commit date: Tue Oct 26 11:07:57 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.64         0:00.47         0:00.38         241M            110M
   waveform-max         0:04.45         0:03.51         0:02.97         462M            247M
   cqt-dump             0:10.37         0:08.72         0:03.39         407M            215M
   spec-dump            0:01.67         0:02.38         0:01.19         202M            70M
   spec-max             0:06.98         0:03.26         0:03.07         902M            594M
   reassigning          1:32.31         0:57.57         0:28.54         1012M           3916M

Baffling - I can see why this should slow down the reassignment test but I can see no reason it should slow down anything else. And wow, look at the extra memory allocated in the reassigning test!

#1 Performance test updates 2 months ago

Comment by ~cannam on ~cannam/bisquay

Series of smallish commits made at the weekend and tested only on the orange laptop. I think we expect the first one or maybe two to be faster, the rest to be of negligible difference or slower. Let's see. In chronological order, and starting with the same commit as above because of a system update in between.

5f2b5d49ad78
Commit date: Fri Oct 22 17:42:27 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.70         0:00.47         0:00.38         241M   131M
   waveform-max         0:04.64         0:03.60         0:02.98         444M            247M
   cqt-dump             0:10.22         0:08.51         0:03.33         407M            214M
   spec-dump            0:01.61         0:02.27         0:01.17         203M            122M
   spec-max             0:06.66         0:02.76         0:02.53         862M            749M
   reassigning          1:16.85         0:41.58         0:15.98         977M            1095M

"Small change to the last commit that slightly speeds things up again under test here":

8fbf5ebd1ef8
Commit date: Sat Oct 23 22:47:06 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.67         0:00.45         0:00.36         241M            110M
   waveform-max         0:04.46         0:03.31         0:02.82         445M            247M
   cqt-dump             0:09.98         0:08.47         0:03.39         406M            215M
   spec-dump            0:01.58         0:02.21         0:01.13         204M            70M
   spec-max             0:06.59         0:03.05         0:02.62         866M            734M
   reassigning          1:16.60         0:43.14         0:16.14         1039M           1101M

"Add, and use, BqFftSlice":

37e9d2a07b95
Commit date: Sat Oct 23 22:49:51 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.65         0:00.44         0:00.36         241M            110M
   waveform-max         0:04.38         0:03.32         0:02.79         447M            247M
   cqt-dump             0:09.97         0:08.72         0:03.56         406M            215M
   spec-dump            0:01.61         0:02.25         0:01.18         202M            70M
   spec-max             0:06.72         0:03.00         0:02.87         902M            594M
   reassigning          1:17.81         0:42.25         0:14.96         1292M           1047M

"Experiment with switching random access storage from array to matrix once fill [sic]". It seems highly unlikely that this would ever speed up any of our tests, since all of them fill their random access storage through in order once and then stop:

680f4b10f06e
Commit date: Sun Oct 24 13:35:52 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.81         0:00.46         0:00.38         253M            147M
   waveform-max         0:04.23         0:03.32         0:02.75         474M            217M
   cqt-dump             0:09.93         0:08.37         0:03.32         407M            215M
   spec-dump            0:01.57         0:02.19         0:01.13         202M            70M
   spec-max             0:06.43         0:02.87         0:02.72         940M            825M
   reassigning          1:16.64         0:41.85         0:15.44         1488M           1047M

If I change the reassigning test to call fillAll on its audio-file random access before starting, the last line becomes

   reassigning          1:18.28         0:42.03         0:15.00         923M            1044M

"Try going row-by-row in matrix conversion":

288cd601fc0e tip
Commit date: Sun Oct 24 13:52:39 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.61         0:00.45         0:00.37         254M            126M
   waveform-max         0:04.32         0:03.28         0:02.77         475M            216M
   cqt-dump             0:10.06         0:08.64         0:03.46         407M            215M
   spec-dump            0:01.61         0:02.28         0:01.20         201M            70M
   spec-max             0:06.30         0:02.96         0:02.64         951M            654M
   reassigning          1:18.72         0:41.84         0:15.02         1449M           1156M

And finally a hybrid in which we keep the latter change (row-by-row in matrix conversion) but lose the previous one (switching random-access to matrix when full):


                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.61         0:00.44         0:00.36         241M            110M
   waveform-max         0:04.33         0:03.32         0:02.79         455M            247M
   cqt-dump             0:10.02         0:08.47         0:03.22         407M            215M
   spec-dump            0:01.59         0:02.23         0:01.13         202M            70M
   spec-max             0:06.62         0:03.02         0:02.88         906M            594M
   reassigning          1:18.32         0:42.51         0:14.88         1326M           1047M

I think we should keep that last one. The changes up to and including 37e9d2a07b95 make sense to me and did perhaps result in a speedup for reassignment and lower memory usage. The row-by-row-matrix thing is vaguely sensible. The other (switch random access to matrix once full) I will revert.

It's striking how far we haven't come - the tests that existed three months ago are all pretty much the same as they were then.

#1 Performance test updates 3 months ago

Comment by ~cannam on ~cannam/bisquay

Cache the next & following read stream and matrix in the framing reader. Re-reading is very cheap, but for heavily overlapped framers this may still be cheaper.

5f2b5d49ad78 tip
Commit date: Fri Oct 22 17:42:27 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.69         0:00.49         0:00.37         241M            131M
   waveform-max         0:04.47         0:03.42         0:02.83         443M            247M
   cqt-dump             0:10.27         0:08.85         0:03.47         407M            214M
   spec-dump            0:01.71         0:02.30         0:01.19         203M            121M
   spec-max             0:06.91         0:02.94         0:02.68         936M            749M
   reassigning          1:19.00         0:43.70         0:16.60         1404M           1095M

#1 Performance test updates 3 months ago

Comment by ~cannam on ~cannam/bisquay

Integrate FFI version of SignalWindowPost. Should affect only the reassigning test.

150c30c4bc51 tip
Commit date: Fri Oct 22 09:03:02 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.68         0:00.49         0:00.38         241M            131M
   waveform-max         0:04.46         0:03.44         0:02.85         468M            247M
   cqt-dump             0:10.20         0:08.86         0:03.43         406M            214M
   spec-dump            0:01.68         0:02.33         0:01.20         203M            122M
   spec-max             0:06.59         0:03.18         0:02.73         978M            899M
   reassigning          1:31.75         0:55.18         0:18.85         1041M           1107M

The non-FFI version is slower (if nicer) but it does help the FFI build. The non-FFI code could be sped up again at the expense of tidiness.

I'm not sure why the spec tests are back to their earlier performance/memory profiles. Possibly it has to do with the (necessary) fix to a Subscript error in bsq-samplestreams commit:fe337c191783, made just after the previous test run.

#1 Performance test updates 3 months ago

Comment by ~cannam on ~cannam/bisquay

Change to framing logic, so as to slice then concat rather than concat then slice. Does appear to reduce memory usage a bit; doesn't appear to be any faster.

265bd745f760 tip
Commit date: Wed Oct 20 10:51:09 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
   waveform             0:00.64         0:00.49         0:00.36         240M            110M
   waveform-max         0:04.44         0:03.31         0:02.83         441M            247M
   cqt-dump             0:10.21         0:08.30         0:03.31         406M            214M
   spec-dump            0:01.61         0:02.22         0:01.19         203M            70M
   spec-max             0:06.83         0:02.95         0:02.56         973M            685M
   reassigning          1:16.34         0:45.96         0:20.17         1190M           1119M

#1 Performance test updates 3 months ago

Comment by ~cannam on ~cannam/bisquay

Small change to overflow handling in RealTime, affecting the time offset calculation:

dcbf3f778c80 tip
Commit date: Wed Oct 20 09:39:58 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
...
   reassigning          1:14.41         0:45.64         0:20.41         1422M           1112M

#1 Performance test updates 3 months ago

Comment by ~cannam on ~cannam/bisquay

After switching to carrying out the Hann and Hann-derivative windowing on the frequency domain side (to reduce the number of FFT streams):

26f851d9c582 tip
Commit date: Tue Oct 19 17:53:36 2021 +0100

                        polyml          mlton_noffi     mlton_release   mem/polyml      mem/release
...
   reassigning          1:17.63         0:47.02         0:21.33         1288M           1108M