~emersion/goguma#147: 
Duplicate messages fetched in the background on Android

This has been going on for a while, just opening this for tracking and future reference.

Sometimes goguma will show duplicate messages. This looks like:

21:35 <emersion> foo
21:35 <emersion> foo
21:35 <emersion> foo
21:42 <emersion> bar
21:42 <emersion> bar
21:42 <emersion> bar

The repetition number varies, usually between 2 and 5. Only happens on Android, pretty randomly. For me it's more frequent with a spotty network and when leaving the phone idle for a long time.

I've tried reproducing in the Android emulator, simulating a TCP RST on the soju side, simulating a cellular network disconnection, killing the app in the background, to no avail.

This can't be a chathistory timestamp precision issue (due to second granularity), because the repetition happens with messages with a completely different timestamp fetched in the same batch.

Originally I thought this was a bug in our chathistory logic (timestamp not bumped in commands), or in our reconnection logic (resulting in multiple sockets opened at the same time), but after reading the code over and over again in the past year or so I couldn't find anything amiss.

My more recent theory is that somehow multiple instances of goguma are running in parallel somehow (in separate Dart isolates). I've added a check for the return value of IsolateNameServer.registerPortWithName(). The logs look like:

[05-10 01:57:42.894 14946:19485 W/FlutterJNI]
FlutterJNI.loadLibrary called more than once

[05-10 01:57:42.896 14946:14946 D/FlutterGeolocator]
Flutter engine connected. Connected engine count 2

[05-10 01:57:42.898 14946:19486 W/FlutterJNI]
FlutterJNI.prefetchDefaultFontManager called more than once

[05-10 01:57:42.899 14946:14946 W/FlutterJNI]
FlutterJNI.init called more than once

[05-10 01:57:42.994 14946:19482 I/flutter]
Executing work manager task: sync

[05-10 01:57:42.995 14946:15012 I/flutter]
Starting chat history synchronization

[05-10 01:57:43.041 14946:15012 I/flutter]
Finished chat history synchronization

[05-10 01:57:43.045 14946:14946 D/FlutterGeolocator]
Flutter engine disconnected. Connected engine count 1

[05-10 01:57:43.050 14946:14946 E/FlutterGeolocator]
Geolocator position updates stopped

[05-10 01:57:43.050 14946:14946 E/FlutterGeolocator]
There is still another flutter engine connected, not stopping location service

[05-10 05:15:59.167 14946:14946 D/FlutterGeolocator]
Flutter engine connected. Connected engine count 2

[05-10 05:15:59.226 14946:22144 I/flutter]
MAIN #0      main (package:goguma/main.dart:34)
<a href="/~emersion/goguma/1" title="~emersion/goguma#1: Enable TCP keepalive">#1</a>      main (package:goguma/main_generated.dart:16)
<a href="/~emersion/goguma/2" title="~emersion/goguma#2: Open notifications on highlight or PM">#2</a>      _runMain.<anonymous closure> (dart:ui/hooks.dart:301)
<a href="/~emersion/goguma/3" title="~emersion/goguma#3: Add support for soju.im/bouncer-networks">#3</a>      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:297)
<a href="/~emersion/goguma/4" title="~emersion/goguma#4: Server status snackbar doesn&#39;t push up UI">#4</a>      _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:184)

[05-10 05:15:59.227 14946:22144 I/flutter]
Bad state: IsolateNameServer.registerPortWithName failed!

[05-10 05:15:59.229 14946:22144 I/flutter]
#0      main (package:goguma/main.dart:48)

[05-10 05:15:59.230 14946:22144 I/flutter]
<a href="/~emersion/goguma/1" title="~emersion/goguma#1: Enable TCP keepalive">#1</a>      main (package:goguma/main_generated.dart:16)

[05-10 05:15:59.230 14946:22144 I/flutter]
<a href="/~emersion/goguma/2" title="~emersion/goguma#2: Open notifications on highlight or PM">#2</a>      _runMain.<anonymous closure> (dart:ui/hooks.dart:301)

[05-10 05:15:59.230 14946:22144 I/flutter]
<a href="/~emersion/goguma/3" title="~emersion/goguma#3: Add support for soju.im/bouncer-networks">#3</a>      _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:297)

[05-10 05:15:59.230 14946:22144 I/flutter]
<a href="/~emersion/goguma/4" title="~emersion/goguma#4: Server status snackbar doesn&#39;t push up UI">#4</a>      _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:184)

[05-10 06:00:43.479 22473:22509 W/FlutterJNI]
FlutterJNI.loadLibrary called more than once

[05-10 06:00:43.483 22473:22473 D/FlutterGeolocator]
Creating service.

[05-10 06:00:43.483 22473:22473 D/FlutterGeolocator]
Binding to location service.

[05-10 06:00:43.486 22473:22473 W/FlutterJNI]
FlutterJNI.init called more than once

[05-10 06:00:43.489 22473:22473 D/FlutterGeolocator]
Flutter engine connected. Connected engine count 1

[05-10 06:00:43.814 22473:22511 I/flutter]
Executing work manager task: sync

[05-10 06:00:46.175 22473:22511 I/flutter]
Starting chat history synchronization

[05-10 06:00:46.178 22473:22511 I/flutter]
[0] Connecting to irc.emersion.fr...

[05-10 06:00:46.178 22473:22511 I/flutter]
[1] Connecting to irc.emersion.fr...

[05-10 06:00:46.179 22473:22511 I/flutter]
[2] Connecting to irc.emersion.fr...

[05-10 06:00:46.179 22473:22511 I/flutter]
[3] Connecting to irc.emersion.fr...

[05-10 06:00:46.580 22473:22511 I/flutter]
[0] Connection opened

[05-10 06:00:46.583 22473:22511 I/flutter]
[0] Starting SASL PLAIN authentication

[05-10 06:00:46.585 22473:22511 I/flutter]
[2] Connection opened

[05-10 06:00:46.586 22473:22511 I/flutter]
[2] Starting SASL PLAIN authentication

[05-10 06:00:46.587 22473:22511 I/flutter]
[1] Connection opened

[05-10 06:00:46.587 22473:22511 I/flutter]
[1] Starting SASL PLAIN authentication

[05-10 06:00:46.588 22473:22511 I/flutter]
[3] Connection opened

[05-10 06:00:46.588 22473:22511 I/flutter]
[3] Starting SASL PLAIN authentication

[05-10 06:00:46.846 22473:22511 I/flutter]
[3] Registration complete

[05-10 06:00:46.883 22473:22511 I/flutter]
[0] Registration complete

[05-10 06:00:46.890 22473:22511 I/flutter]
[1] Registration complete

[05-10 06:00:46.907 22473:22511 I/flutter]
[2] Registration complete

[05-10 06:00:47.255 22473:22511 I/flutter]
Finished chat history synchronization

[05-10 06:00:47.256 22473:22511 I/flutter]
[0] Destroying client

[05-10 06:00:47.256 22473:22511 I/flutter]
[1] Destroying client

[05-10 06:00:47.256 22473:22511 I/flutter]
[2] Destroying client

[05-10 06:00:47.256 22473:22511 I/flutter]
[3] Destroying client

[05-10 06:00:47.257 22473:22511 I/flutter]
[0] Connection closed

[05-10 06:00:47.257 22473:22511 I/flutter]
[1] Connection closed

[05-10 06:00:47.257 22473:22511 I/flutter]
[2] Connection closed

[05-10 06:00:47.257 22473:22511 I/flutter]
[3] Connection closed

[05-10 06:00:47.260 22473:22473 D/FlutterGeolocator]
Flutter engine disconnected. Connected engine count 0

[05-10 06:00:47.264 22473:22473 E/FlutterGeolocator]
Geolocator position updates stopped

[05-10 06:00:47.264 22473:22473 D/FlutterGeolocator]
Stopping location service.

[05-10 06:00:47.298 22473:22473 D/FlutterGeolocator]
Unbinding from location service.

[05-10 06:00:47.298 22473:22473 D/FlutterGeolocator]
Destroying location service.

[05-10 06:00:47.298 22473:22473 D/FlutterGeolocator]
Stopping location service.

[05-10 06:00:47.299 22473:22473 D/FlutterGeolocator]
Destroyed location service.

This is a bit puzzling: at 05:15:59, main() gets called for seemingly no reason (?) and it turns out that the previous main() function's isolate is still running.

Status
REPORTED
Submitter
~emersion
Assigned to
No-one
Submitted
6 months ago
Updated
5 months ago
Labels
bug

~emersion 6 months ago

Simon Ser referenced this ticket in commit c3f2ebf.

~emersion 6 months ago

Update: the workaround seems to work properly (no duplicate messages in a few days), however I think we might miss some push notifications or workmanager jobs if we abruptly throw an error from main() like this? Not sure. After adding more debugging it seems like main() is invoked a second time when a push message is received. It also seems like the new app instance is correctly started with AppLifecycleState.detached.

Previously in 6dc63b850b80e21b7d9bd184cd3765e4b2b24954 I've tried to make it so the clients stayed disconnected when starting the app in detached state. However this didn't work out because of the connectivity logic: we will reconnect on Connectivity().onConnectivityChanged, and that is triggered on app startup (multiple times it seems). I've fixed it in 67c90bfb374ea24a1fa9c8ac2c695ca5bf8dbb4d, we'll see if it's enough to get rid of the dup messages. If it ends up working, we'll maybe still want to add something extra because it'd be quite fragile…

~emersion 6 months ago

Unfortunately it seems like the above doesn't do the trick. Seems like it can happen that a duplicated main() invocation starts in AppLifecycleState.resumed state.

[05-12 18:59:29.888 19619:19655 I/flutter]
RUN main()

[05-12 18:59:30.050 19619:19655 I/flutter]
Warning: failed to initialize Firebase: [firebase_messaging/unknown] java.io.IOException: MISSING_INSTANCEID_SERVICE

[05-12 18:59:30.050 19619:19655 I/flutter]
Got UnifiedPush message for XXX

[05-12 18:59:30.054 19619:19655 I/flutter]
initializeCallback finished

[05-12 18:59:30.058 19619:19655 I/flutter]
Using UnifiedPush distributor: io.heckel.ntfy

[05-12 18:59:30.103 19619:19655 I/flutter]
Decrypted push message payload: XXX

[05-12 18:59:36.178 19619:19655 I/flutter]
APP INIT STATE AppLifecycleState.detached

[05-12 18:59:36.321 19619:19655 I/flutter]
connectivity changed: false

[05-12 18:59:36.321 19619:19655 I/flutter]
connectivity changed: false

[05-12 18:59:36.321 19619:19655 I/flutter]
connectivity changed: false

[05-12 19:07:41.071 19619:19655 I/flutter]
connectivity changed: false

[05-12 19:07:46.133 19619:19655 I/flutter]
connectivity changed: true

[05-12 19:07:46.137 19619:19655 I/flutter]
connectivity changed: true

[05-12 19:07:46.141 19619:19655 I/flutter]
connectivity changed: true

[05-12 19:07:46.414 19619:19655 I/flutter]
connectivity changed: true

[05-12 19:07:47.814 19619:19655 I/flutter]
connectivity changed: true

[05-12 19:07:51.097 19619:20025 I/flutter]
RUN main()

[05-12 19:07:51.097 19619:20025 I/flutter]
ERROR: IsolateNameServer.registerPortWithName failed! Continuing...

[05-12 19:07:51.136 19619:19619 D/FlutterGeolocator]
Flutter engine connected. Connected engine count 2

[05-12 19:07:51.217 19619:20025 I/flutter]
Warning: failed to initialize Firebase: [firebase_messaging/unknown] java.io.IOException: MISSING_INSTANCEID_SERVICE

[05-12 19:07:51.217 19619:20025 I/flutter]
initializeCallback finished

[05-12 19:07:51.219 19619:20025 I/flutter]
Using UnifiedPush distributor: io.heckel.ntfy

[05-12 19:07:51.453 19619:20025 I/flutter]
APP INIT STATE AppLifecycleState.resumed

~emersion 6 months ago*

Another bad news: today I hit a case where bailing out of duplicate main() invocations caused issues: a push notification is received, main() is invoked in the background, IsolateNameServer.registerPortWithName() works, and then later on when launching the app normally a new main() invocation is triggered. Bailing out results in stuck loading screen (when returning from main()).

Maybe it would be possible to IsolateNameServer.registerPortWithName() conditionally when we're not in the detached state, and unregistering when we go to detached state? Might still result in duplicate messages if the connection is still going in detached state… We'd need to explicitly disconnect when going into detached state. Waiting for connections to be broken wouldn't work because nothing would stop main() from being invoked while in detached state and the connections haven't been broken yet by the OS.

~emersion 5 months ago

Simon Ser referenced this ticket in commit 2b3c87a.

~emersion 5 months ago

Simon Ser referenced this ticket in commit 65de8cd.

~emersion 5 months ago

Simon Ser referenced this ticket in commit 5264484.

~emersion 5 months ago

Simon Ser referenced this ticket in commit 3694407.

~emersion 5 months ago

Simon Ser referenced this ticket in commit d713270.

~emersion 5 months ago

Simon Ser referenced this ticket in commit fc4d4aa.

~emersion 5 months ago

Simon Ser referenced this ticket in commit 851711e.

~emersion 5 months ago

Maybe it would be possible to IsolateNameServer.registerPortWithName() conditionally when we're not in the detached state, and unregistering when we go to detached state?

Done that in my branch, but was still seeing dup messages from time to time.

I've found another possible culprit: the workmanager Isolate may terminate before the lastDeliveredTime field was being synced into the DB: https://git.sr.ht/~emersion/goguma/commit/5439f9215a821ea0bce7fb38d8bc3594c2940316

~emersion 5 months ago

Simon Ser referenced this ticket in commit 58f1025.

Register here or Log in to comment, or comment via email.