~cannam

Trackers

~cannam/bisquay

Last active 14 days ago

~cannam/easyhg

Last active 10 months ago

#1 Performance test updates 14 days ago

Comment by ~cannam on ~cannam/bisquay

Does -flto make any difference?

Build time:

mlton_release  3:55.71

Run times:

82b1ac2f1943+ tip
Commit date: Wed Jul 21 20:35:57 2021 +0100

                        polyml          mlton_noffi     mlton_release
   waveform             0:00.65         0:00.47         0:00.36
   waveform-max         0:04.44         0:03.36         0:02.69
   cqt-dump             0:10.20         0:08.43         0:03.28
   spec-dump            0:01.67         0:02.24         0:01.17
   spec-max             0:08.29         0:03.97         0:03.36

#1 Performance test updates 14 days ago

Comment by ~cannam on ~cannam/bisquay

I added a mlton_profile mode, but the problem is that profile builds take even longer to compile than release ones. Build times:

polyml         0:11.87
mlton_noffi    0:55.80
mlton_profile  6:10.56
mlton_release  4:04.40

Run times:

82b1ac2f1943+ tip
Commit date: Wed Jul 21 20:35:57 2021 +0100

                        polyml          mlton_noffi     mlton_profile   mlton_release
   waveform             0:00.58         0:00.46         0:00.61         0:00.36
   waveform-max         0:04.38         0:03.57         0:07.74         0:02.79
   cqt-dump             0:10.32         0:08.53         0:04.99         0:03.47
   spec-dump            0:01.69         0:02.29         0:01.72         0:01.23
   spec-max             0:08.40         0:03.95         0:04.18         0:03.33

I am not going to commit this, but it's here for future reference.

#1 Performance test updates 15 days ago

Comment by ~cannam on ~cannam/bisquay

The effects of IPP. This is without using IPP at all, with USE_BUILTIN_FFT. (Of course, this should affect only the mlton_release build.)

b77deb10aaa4+ tip
Commit date: Tue Jul 20 16:01:08 2021 +0100

                        polyml          mlton_noffi     mlton_release
   waveform             0:00.54         0:00.47         0:00.37
   waveform-max         0:04.25         0:03.38         0:02.88
   cqt-dump             0:10.00         0:08.51         0:03.49
   spec-dump            0:01.88         0:02.40         0:01.21
   spec-max             0:08.45         0:03.92         0:03.33

This is with IPP, but using it only for bqvec, without using the IPP FFT (achieved by defining both HAVE_IPP and USE_BUILTIN_FFT but adding #undef HAVE_IPP at the top of FFT.cpp)


                        polyml          mlton_noffi     mlton_release
   waveform             0:00.59         0:00.46         0:00.36
   waveform-max         0:04.37         0:03.41         0:02.87
   cqt-dump             0:10.15         0:08.44         0:03.36
   spec-dump            0:01.68         0:02.26         0:01.15
   spec-max             0:08.92         0:04.03         0:03.36

It appears all these results are within the margin of error - including cqt-dump, as the first run of that in the first case above (the reported one was the second run) only took 3.30.

IPP adds massively to the file size; these are with and without respectively:

-rwxr-xr-x 1 cannam cannam 17284696 Jul 20 16:02 bsq_perftest
-rwxr-xr-x 1 cannam cannam 25822464 Jul 20 16:02 bsq_test

-rwxr-xr-x 1 cannam cannam  2919752 Jul 21 09:23 bsq_perftest
-rwxr-xr-x 1 cannam cannam 11412848 Jul 21 09:24 bsq_test

It is possible to split out the IPP static archive file to achieve smaller file size, but is it worth it? I'm inclined to think not.

And here are the results using FFTW, with neither IPP (at all) nor the built-in FFT:


                        polyml          mlton_noffi     mlton_release
   waveform             0:00.55         0:00.46         0:00.36
   waveform-max         0:04.33         0:03.40         0:02.89
   cqt-dump             0:10.48         0:08.70         0:03.45
   spec-dump            0:01.72         0:02.30         0:01.19
   spec-max             0:08.38         0:03.96         0:03.37

I'm going to commit USE_BUILTIN_FFT for Windows and Linux and drop IPP.

#1 Performance test updates 16 days ago

Comment by ~cannam on ~cannam/bisquay

Thinkpad T60 for comparison (64-bit Core 2 T7600, 3GB, SAMSUNG MZ7TD256 SSD with Linux)

b77deb10aaa4 tip
Commit date: Tue Jul 20 16:01:08 2021 +0100

			polyml		mlton_noffi	mlton_release	
   waveform		0:01.61		0:01.53		0:01.10		
   waveform-max		0:12.58		0:11.89		0:09.66		
   cqt-dump		0:41.60		0:27.91		0:09.86		
   spec-dump		0:05.17		0:06.51		0:02.39		
   spec-max		0:31.04		0:11.46		0:09.69		

#1 Performance test updates 16 days ago

Comment by ~cannam on ~cannam/bisquay

After adding spec-max test, which uses the peak-tower summarising spectrogram mechanism rather than just dumping out all frames of a spectrogram. Closer to the display logic.

b77deb10aaa4 tip
Commit date: Tue Jul 20 16:01:08 2021 +0100

                        polyml          mlton_noffi     mlton_release
   waveform             0:00.55         0:00.48         0:00.37
   waveform-max         0:04.45         0:03.49         0:02.90
   cqt-dump             0:10.61         0:08.93         0:03.71
   spec-dump            0:01.75         0:02.50         0:01.34
   spec-max             0:08.92         0:04.12         0:03.47

#1 Performance test updates 17 days ago

Comment by ~cannam on ~cannam/bisquay

Linux on the T540p again. After adjusting the set of waveform peak resolutions (from 32, 256, 16384 to 32, 256, 4096, 65536) based on waveform-max timings.

Incidentally I just noticed that the Mac version of the script dumps all stdout to /dev/null, while the Linux version doesn't. This should make Mac runs generally a bit faster than Linux ones. I'm not going to change either version, because I think the main thing is consistency over time within the platform.

f31591078609 tip
Commit date: Mon Jul 19 09:39:10 2021 +0100

                        polyml          mlton_noffi     mlton_release
   waveform             0:00.57         0:00.47         0:00.37
   waveform-max         0:04.26         0:03.40         0:02.88
   cqt-dump             0:10.13         0:08.49         0:03.58
   spec-dump            0:01.68         0:02.21         0:01.21

#1 Performance test updates 27 days ago

Comment by ~cannam on ~cannam/bisquay

For comparison, Mac timings on the same M1 Mac but using x86_64 binaries from an x86_64 MLton. In both cases MLton's C codegen was used. The Poly/ML build is the same as the previous comment, and the code is the same as the first, "unmodified" test from the previous comment. Second of two runs.

3ce37fcd35f4 tip
Commit date: Fri Jul 09 14:32:55 2021 +0100

			polyml		mlton_noffi	mlton_release	
   waveform		0.48		0.50		0.26		
   waveform-max		3.66		3.77		3.19		
   cqt-dump		7.70		5.46		2.17		
   spec-dump		1.24		1.52		0.70		

#1 Performance test updates 27 days ago

Comment by ~cannam on ~cannam/bisquay

Some Mac tests. MacBook Pro M1 (2020) 8GB on mains power. The MLton builds are arm64 binaries but Poly/ML is Intel.

Purpose here, aside from to get some Mac timings at all, was to compare sml-stringinterpolate experimentally using the Ryu float-printer, and to see whether it sped things up enough that we would no longer see any gain from using SvgShorten before dumping out SVG paths. The answer is that Ryu seems to make little difference to SVG output, SvgShorten still does better, and while Ryu does improve the CQ output, we wouldn't normally be dumping that as text in a serious application anyway.

Again these are second of two runs.

1. unmodified

3ce37fcd35f4 tip
Commit date: Fri Jul 09 14:32:55 2021 +0100

			polyml		mlton_noffi	mlton_release	
   waveform		0.52		0.29		0.14		
   waveform-max		3.85		2.28		1.62		
   cqt-dump		7.62		4.61		1.62		
   spec-dump		1.29		1.26		0.51

2. with ryu

3ce37fcd35f4+ tip
Commit date: Fri Jul 09 14:32:55 2021 +0100

			polyml		mlton_noffi	mlton_release	
   waveform		0.48		0.28		0.14		
   waveform-max		3.80		2.31		1.62		
   cqt-dump		7.69		4.63		1.20		
   spec-dump		1.26		1.25		0.32		

3. without ryu and also without svgshorten

3ce37fcd35f4 tip
Commit date: Fri Jul 09 14:32:55 2021 +0100

			polyml		mlton_noffi	mlton_release	
   waveform		0.47		0.25		0.14		
   waveform-max		3.99		2.36		1.79		
   cqt-dump		7.73		4.65		1.64		
   spec-dump		1.24		1.27		0.52		

4. with ryu and without svgshorten

3ce37fcd35f4+ tip
Commit date: Fri Jul 09 14:32:55 2021 +0100

			polyml		mlton_noffi	mlton_release	
   waveform		0.51		0.24		0.13		
   waveform-max		4.18		2.37		1.72		
   cqt-dump		7.85		4.67		1.23		
   spec-dump		1.26		1.27		0.33		

#1 Performance test updates 27 days ago

Comment by ~cannam on ~cannam/bisquay

After adding waveform-max test (NB reported run is second of two - I think I will continue to do this)

3ce37fcd35f4 tip
Commit date: Fri Jul 09 14:32:55 2021 +0100

                        polyml          mlton_noffi     mlton_release
   waveform             0:00.58         0:00.54         0:00.42
   waveform-max         0:05.68         0:04.01         0:03.30
   cqt-dump             0:10.46         0:08.69         0:03.48
   spec-dump            0:01.68         0:02.22         0:01.21

#18 Assertion in MovingMedian::drop a month ago

Comment by ~cannam on ~breakfastquay/rubberband

I've reviewed the code and run some quite intensive tests, and I am fairly sure that this can only happen if a NaN has found its way into the moving-median filter.

There is a test for NaN in MovingMedian::push, but it won't work if -ffast-math is used to compile the code. In that situation, any NaN found at the input will go straight into the filter and cause this assertion to fail on retrieval shortly afterward.

So I'd say first question is whether you are in fact building with -ffast-math. If you are, try rebuilding without, and see if the problem goes away or is replaced by stderr output from the line in MovingMedian::push that checks for NaN values. (It should be fine to build Rubber Band with -ffast-math, it's just that some safety checks like this are lost.)

This MovingMedian filter contains spectral summary data from the input, which has been through windowing and a forward FFT but not much else - no phase manipulation and, certainly in the case of the study() call, it has not encountered the complexity of a resampler yet. If there are NaNs in the signal at this point, I think the most likely - and hopefully easiest to test - explanation would be that a NaN value has actually been passed to Rubber Band as input. Can you readily check this?