Another attempt at using flat vectors for components. This is with the component represented using three reals (value, intensity, and offset), rather than two reals and a RealTime (value, intensity, and time). We can do this now because the component summariser now knows the hop rate, so can deal with offsets from hop time rather than needing absolute times.
Note that this has the advantage of faster streaming for component matrices, something that is sadly not measured in these tests but should make for a significant improvement in the UI. So it may be acceptable for this to be a little slower in these tests, however disappointing that might be.
ae305d318fe5 (time-offset-again) tip Commit date: Mon Dec 20 14:35:18 2021 +0000 polyml mlton_noffi mlton_release mlton_ipp mem/polyml mem/release waveform 0:00.70 0:00.46 0:00.36 0:00.35 242M 162M waveform-max 0:04.57 0:03.29 0:02.73 0:02.74 452M 248M cqt-dump 0:11.41 0:09.07 0:03.29 0:03.49 996M 215M spec-dump 0:01.70 0:02.21 0:01.19 0:02.15 203M 90M spec-max 0:07.04 0:03.92 0:03.56 0:03.50 957M 707M reassigning 1:39.14 1:01.09 0:27.91 0:27.76 2290M 3382M
The above is using an FFI call to sort component vectors, something which looked as if it may be advantageous enough to make them faster overall (but apparently not). Without that - using just the flat vector with the same sorting calls as earlier - we have
reassigning 1:38.06 1:01.95 0:33.65 0:33.82 2452M 3529M
After introducing the ability to fill from the middle of the columnar random-access, in order to improve responsiveness when scrolling and zooming a slow-to-generate spectrogram. This is the work of the
read-budgetbranches, having recently been merged back to default.
e79e7d419a36 tip Commit date: Wed Dec 15 08:25:28 2021 +0000 polyml mlton_noffi mlton_release mlton_ipp mem/polyml mem/release waveform 0:00.72 0:00.50 0:00.38 0:00.37 241M 154M waveform-max 0:04.39 0:03.37 0:02.90 0:02.80 449M 230M cqt-dump 0:11.18 0:09.48 0:03.59 0:03.39 957M 209M spec-dump 0:01.73 0:02.32 0:01.19 0:01.22 203M 122M spec-max 0:06.39 0:04.01 0:03.43 0:03.43 999M 611M reassigning 1:23.62 0:57.21 0:24.40 0:23.12 1229M 3313M
spec-maxis slower is a pity, but not surprising given that we now store columnar data with an option for each column, so straight reads of pre-filled data are inevitably slower. I think this is a reasonable tradeoff, though it's also possible that I've missed something and much of the overhead is actually from somewhere else.
Note that (with reference to the previous comment above) I did retain the use of flat vectors for
ComplexVectorthroughout, which led to a small slowdown in the Poly/ML CQT but made little difference elsewhere. Flat vectors were already explicitly bodged in for the FFI-compatible
ComplexVector, so that didn't do a lot to release builds.
Some experiments with flat vectors (the
Here's introducing a general flat vector of 2- or 3- element values and using it for complex vector, array, & matrix throughout (including in pure-SML builds).
ca5ac2beefab (flat-everything) tip Commit date: Wed Oct 27 11:28:31 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.70 0:00.47 0:00.38 241M 110M waveform-max 0:04.61 0:03.45 0:02.87 449M 247M cqt-dump 0:12.12 0:09.68 0:03.47 988M 216M spec-dump 0:01.70 0:02.32 0:01.18 203M 70M spec-max 0:06.96 0:03.07 0:02.56 930M 592M reassigning 1:26.83 0:59.79 0:28.97 1134M 3682M
This flat vector could be a little slower than the previous custom
FlatComplexVectorbecause it isn't so tightly specialised for 2-element values. We might possibly just be better off essentially renaming the old complex vector to
Flat2Vectorfor any 2-element value and add
Flat3Vectorif we ever actually do use 3-element flat vectors in future.
Part of the motivation was to use flat vectors for component vectors as well. I made two attempts at that:
Word64as the storage type and trying to convert components to it via
PackReal64Littleand friends (+ a new
RealTime.pack). I haven't yet found a way to do this conversion at zero cost - the Basis library mechanisms go through arrays themselves that are apparently not optimised out - so this took 3:27 to run the release reassigning test (the one that is under 30 seconds above). And Poly/ML doesn't support enough Pack structures to build this.
Realas the storage type and converting using
RealTime.toReal. This can never be "correct", but I wanted to compare timings. It took 2:53 for the same release reassigning test.
Unless there is some cost-free way to pack reals and
FixedInt.intto words, I don't think this can go anywhere. And even MLton's
PACK_REAL_EXTRAdoesn't seem to have one. An alternative would be to have the flat vector functor argument provide functions that pack directly into a byte array, rather than convert to a rep type. I don't entirely fancy writing that.
After starting to merge similar components when summarising frequency component streams:
6ae068c16b8e tip Commit date: Tue Oct 26 11:07:57 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.64 0:00.47 0:00.38 241M 110M waveform-max 0:04.45 0:03.51 0:02.97 462M 247M cqt-dump 0:10.37 0:08.72 0:03.39 407M 215M spec-dump 0:01.67 0:02.38 0:01.19 202M 70M spec-max 0:06.98 0:03.26 0:03.07 902M 594M reassigning 1:32.31 0:57.57 0:28.54 1012M 3916M
Baffling - I can see why this should slow down the reassignment test but I can see no reason it should slow down anything else. And wow, look at the extra memory allocated in the reassigning test!
Series of smallish commits made at the weekend and tested only on the orange laptop. I think we expect the first one or maybe two to be faster, the rest to be of negligible difference or slower. Let's see. In chronological order, and starting with the same commit as above because of a system update in between.
5f2b5d49ad78 Commit date: Fri Oct 22 17:42:27 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.70 0:00.47 0:00.38 241M 131M waveform-max 0:04.64 0:03.60 0:02.98 444M 247M cqt-dump 0:10.22 0:08.51 0:03.33 407M 214M spec-dump 0:01.61 0:02.27 0:01.17 203M 122M spec-max 0:06.66 0:02.76 0:02.53 862M 749M reassigning 1:16.85 0:41.58 0:15.98 977M 1095M
"Small change to the last commit that slightly speeds things up again under test here":
8fbf5ebd1ef8 Commit date: Sat Oct 23 22:47:06 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.67 0:00.45 0:00.36 241M 110M waveform-max 0:04.46 0:03.31 0:02.82 445M 247M cqt-dump 0:09.98 0:08.47 0:03.39 406M 215M spec-dump 0:01.58 0:02.21 0:01.13 204M 70M spec-max 0:06.59 0:03.05 0:02.62 866M 734M reassigning 1:16.60 0:43.14 0:16.14 1039M 1101M
"Add, and use, BqFftSlice":
37e9d2a07b95 Commit date: Sat Oct 23 22:49:51 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.65 0:00.44 0:00.36 241M 110M waveform-max 0:04.38 0:03.32 0:02.79 447M 247M cqt-dump 0:09.97 0:08.72 0:03.56 406M 215M spec-dump 0:01.61 0:02.25 0:01.18 202M 70M spec-max 0:06.72 0:03.00 0:02.87 902M 594M reassigning 1:17.81 0:42.25 0:14.96 1292M 1047M
"Experiment with switching random access storage from array to matrix once fill [sic]". It seems highly unlikely that this would ever speed up any of our tests, since all of them fill their random access storage through in order once and then stop:
680f4b10f06e Commit date: Sun Oct 24 13:35:52 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.81 0:00.46 0:00.38 253M 147M waveform-max 0:04.23 0:03.32 0:02.75 474M 217M cqt-dump 0:09.93 0:08.37 0:03.32 407M 215M spec-dump 0:01.57 0:02.19 0:01.13 202M 70M spec-max 0:06.43 0:02.87 0:02.72 940M 825M reassigning 1:16.64 0:41.85 0:15.44 1488M 1047M
If I change the reassigning test to call
fillAllon its audio-file random access before starting, the last line becomes
reassigning 1:18.28 0:42.03 0:15.00 923M 1044M
"Try going row-by-row in matrix conversion":
288cd601fc0e tip Commit date: Sun Oct 24 13:52:39 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.61 0:00.45 0:00.37 254M 126M waveform-max 0:04.32 0:03.28 0:02.77 475M 216M cqt-dump 0:10.06 0:08.64 0:03.46 407M 215M spec-dump 0:01.61 0:02.28 0:01.20 201M 70M spec-max 0:06.30 0:02.96 0:02.64 951M 654M reassigning 1:18.72 0:41.84 0:15.02 1449M 1156M
And finally a hybrid in which we keep the latter change (row-by-row in matrix conversion) but lose the previous one (switching random-access to matrix when full):
polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.61 0:00.44 0:00.36 241M 110M waveform-max 0:04.33 0:03.32 0:02.79 455M 247M cqt-dump 0:10.02 0:08.47 0:03.22 407M 215M spec-dump 0:01.59 0:02.23 0:01.13 202M 70M spec-max 0:06.62 0:03.02 0:02.88 906M 594M reassigning 1:18.32 0:42.51 0:14.88 1326M 1047M
I think we should keep that last one. The changes up to and including 37e9d2a07b95 make sense to me and did perhaps result in a speedup for reassignment and lower memory usage. The row-by-row-matrix thing is vaguely sensible. The other (switch random access to matrix once full) I will revert.
It's striking how far we haven't come - the tests that existed three months ago are all pretty much the same as they were then.
Cache the next & following read stream and matrix in the framing reader. Re-reading is very cheap, but for heavily overlapped framers this may still be cheaper.
5f2b5d49ad78 tip Commit date: Fri Oct 22 17:42:27 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.69 0:00.49 0:00.37 241M 131M waveform-max 0:04.47 0:03.42 0:02.83 443M 247M cqt-dump 0:10.27 0:08.85 0:03.47 407M 214M spec-dump 0:01.71 0:02.30 0:01.19 203M 121M spec-max 0:06.91 0:02.94 0:02.68 936M 749M reassigning 1:19.00 0:43.70 0:16.60 1404M 1095M
Integrate FFI version of
SignalWindowPost. Should affect only the reassigning test.
150c30c4bc51 tip Commit date: Fri Oct 22 09:03:02 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.68 0:00.49 0:00.38 241M 131M waveform-max 0:04.46 0:03.44 0:02.85 468M 247M cqt-dump 0:10.20 0:08.86 0:03.43 406M 214M spec-dump 0:01.68 0:02.33 0:01.20 203M 122M spec-max 0:06.59 0:03.18 0:02.73 978M 899M reassigning 1:31.75 0:55.18 0:18.85 1041M 1107M
The non-FFI version is slower (if nicer) but it does help the FFI build. The non-FFI code could be sped up again at the expense of tidiness.
I'm not sure why the spec tests are back to their earlier performance/memory profiles. Possibly it has to do with the (necessary) fix to a Subscript error in
bsq-samplestreamscommit:fe337c191783, made just after the previous test run.
Change to framing logic, so as to slice then concat rather than concat then slice. Does appear to reduce memory usage a bit; doesn't appear to be any faster.
265bd745f760 tip Commit date: Wed Oct 20 10:51:09 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release waveform 0:00.64 0:00.49 0:00.36 240M 110M waveform-max 0:04.44 0:03.31 0:02.83 441M 247M cqt-dump 0:10.21 0:08.30 0:03.31 406M 214M spec-dump 0:01.61 0:02.22 0:01.19 203M 70M spec-max 0:06.83 0:02.95 0:02.56 973M 685M reassigning 1:16.34 0:45.96 0:20.17 1190M 1119M
Small change to overflow handling in
RealTime, affecting the time offset calculation:
dcbf3f778c80 tip Commit date: Wed Oct 20 09:39:58 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release ... reassigning 1:14.41 0:45.64 0:20.41 1422M 1112M
After switching to carrying out the Hann and Hann-derivative windowing on the frequency domain side (to reduce the number of FFT streams):
26f851d9c582 tip Commit date: Tue Oct 19 17:53:36 2021 +0100 polyml mlton_noffi mlton_release mem/polyml mem/release ... reassigning 1:17.63 0:47.02 0:21.33 1288M 1108M