380122016-01-25 20:34:39 +0000Using Task's under memory pressure leads to unexpected crashes inside the TPL on iOS2016-03-03 13:47:41 +0000111XamariniOSMono runtime / AOT compilerXI 9.4 (iOS 9.2)MacintoshMac OSVERIFIEDFIXED---normalUntriaged1tjludovickumperamono-bugs+monotouchRajneeshk---oldest_to_newest1839590tj2016-01-25 20:34:39 +0000I have been investigating a crash inside our app which seems to be related to our image loading process. Our image loading process takes heavy advantage of TPL+async/await. In the process of trying to extract out a useful demonstration of the crash, I started building a small sample. It turns out that there is a crash that can be produced without loading any images at all. The requirement seems to be to cause memory pressure while running many parallel tasks.
The test case does two things (1) runs a loop that allocates memory and updates the UI with the total memory in use to show its not leaking and (2) runs 10 parallel loops of simples task that yield, and then return a value.
The sample code producing the error can be found here: https://bitbucket.org/tpurtell/ios-gc-task-crash . It reproduces quickly on my iOS device (9.2.1) in debug mode (regardless of if the debugger is attached). I can't seem to reproduce it on the simulator or release. It produces these two unhandled exception (reported as NullPointerException via unhandled or debugger UI).
>2016-01-22 18:21:55.676 task-crash[12319:3127110]
>Unhandled Exception:
>0 task-crash 0x002b930f mono_handle_exception_internal + 2306
>1 task-crash 0x002b8a07 mono_handle_exception + 30
>2 task-crash 0x002b37d5 handle_signal_exception + 48
>3 ??? 0x17140a08 0x0 + 387189256
>at System.Threading.Tasks.Task.FinishStageThree () [0x00045] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2387
>at System.Threading.Tasks.Task`1<T_REF>.TrySetResult (T_REF) <0x000d0>
>at System.Threading.Tasks.UnwrapPromise`1<T_REF>.TrySetFromTask (System.Threading.Tasks.Task,bool) <0x002b0>
>at System.Threading.Tasks.UnwrapPromise`1<T_REF>.ProcessInnerTask (System.Threading.Tasks.Task) <0x000a0>
>at System.Threading.Tasks.UnwrapPromise`1<T_REF>.ProcessCompletedOuterTask (System.Threading.Tasks.Task) <0x00168>
>at System.Threading.Tasks.UnwrapPromise`1<T_REF>.InvokeCore (System.Threading.Tasks.Task) <0x00048>
>at System.Threading.Tasks.UnwrapPromise`1<T_REF>.Invoke (System.Threading.Tasks.Task) <0x00050>
>at System.Threading.Tasks.Task.FinishContinuations () [0x0007c] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:3661
>at System.Threading.Tasks.Task.FinishStageThree () [0x00045] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2387
>at System.Threading.Tasks.Task.FinishStageTwo () [0x00074] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2358
>at System.Threading.Tasks.Task.Finish (bool) [0x00049] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2252
>at System.Threading.Tasks.Task.ExecuteWithThreadLocal (System.Threading.Tasks.Task&) [0x00068] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2857
>at System.Threading.Tasks.Task.ExecuteEntry (bool) [0x0006f] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2781
>at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () [0x00000] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/Task.cs:2728
>at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x00096] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/threadpool.cs:859
>at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/threadpool.cs:1196
>at (wrapper runtime-invoke) object.runtime_invoke_dynamic (intptr,intptr,intptr,intptr) <0x00100>
>21 task-crash 0x002c237f mono_jit_runtime_invoke + 1150
>22 task-crash 0x00300cf5 mono_runtime_invoke + 88
>23 task-crash 0x0031a653 worker_thread + 930
>24 task-crash 0x0031e355 start_wrapper + 400
>25 task-crash 0x0034b075 inner_start_thread + 148
>26 libsystem_pthread.dylib 0x20ac2c7f <redacted> + 138
>27 libsystem_pthread.dylib 0x20ac2bf3 _pthread_start + 110
>28 libsystem_pthread.dylib 0x20ac0a08 thread_start + 8
>2016-01-25 11:28:17.650 task-crash[15146:3593921]
>Unhandled Exception:
>0 task-crash 0x002ad30f mono_handle_exception_internal + 2306
>1 task-crash 0x002aca07 mono_handle_exception + 30
>2 task-crash 0x002a6ec3 mono_arm_throw_exception + 106
>3 task-crash 0x0025a9b8 throw_exception + 64
>at System.Threading.Tasks.AwaitTaskContinuation.<ThrowAsyncIfNecessary>m__0 (object) [0x00000] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/Tasks/TaskContinuation.cs:885
>at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context (object) [0x0000e] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/threadpool.cs:1291
>at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00081] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/executioncontext.cs:581
>at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) [0x00000] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/executioncontext.cs:530
>at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem () [0x0002a] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/threadpool.cs:1268
>at System.Threading.ThreadPoolWorkQueue.Dispatch () [0x00096] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/threadpool.cs:859
>at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback () [0x00000] in /Users/builder/data/lanes/2799/179ef070/source/maccore/_build/Library/Frameworks/Xamarin.iOS.framework/Versions/git/src/mono/external/referencesource/mscorlib/system/threading/threadpool.cs:1196
>at (wrapper runtime-invoke) object.runtime_invoke_dynamic (intptr,intptr,intptr,intptr) <0x00100>
>12 task-crash 0x002b637f mono_jit_runtime_invoke + 1150
>13 task-crash 0x002f4cf5 mono_runtime_invoke + 88
>14 task-crash 0x0030e653 worker_thread + 930
>15 task-crash 0x00312355 start_wrapper + 400
>16 task-crash 0x0033f075 inner_start_thread + 148
>17 libsystem_pthread.dylib 0x20ac2c7f <redacted> + 138
>18 libsystem_pthread.dylib 0x20ac2bf3 _pthread_start + 110
>19 libsystem_pthread.dylib 0x20ac0a08 thread_start + 81839901kumpera2016-01-26 00:10:55 +0000Hey Ludo,
We got a TPL bug here.1840572ludovic2016-01-26 11:16:47 +0000Hello TJ,
Thank you very much for the very detailed report and simple repro! I will work on that right away.
For the second crash, it comes from the fact that an exception has bubbled up the async/await code. The way the TPL does it in this case, is to enqueue a new work item on the threadpool which is simply going to rethrow the exception. This has the effect of crashing the process. This is the expected behavior in case of an unhandled exception on a threadpool thread. I will investigate further where this exception comes from.1848223ludovic2016-01-29 19:11:41 +0000Hello TJ,
The issue lie in the way the GC scan registers on ARM. We are missing some of them leading to a object being freed, while it's still accessible. The fix is available here : https://github.com/mono/mono/pull/2540. With it, I can run for tens of thousands of collections without a crash.
I am checking with @kumpera when and we were we will have a release with this fix.
Thank you again,
Ludovic1848284tj2016-01-29 19:33:02 +0000Interesting, thank you very much for tracking this down! :)1852155tj2016-02-02 06:06:25 +0000It definitely seems a bit weird that PC, SP, or LR have a GC root that isn't scanned for other reasons. The patch seems to add a lot of roots and perhaps this alters the timing so that it makes the problem "disappear".1852756ludovic2016-02-02 13:36:37 +0000The previous issue arise when we would suspend the thread on the loopback of Interlocked.Exchange. The implementation is as follow:
> 0x17220c <+92>: dmb sy // full memory barrier
> 0x172210 <+96>: ldrex lr, [r0] // lr= *r0
> 0x172214 <+100>: strex r12, r1, [r0] // *r0 = r1, r12 = succcess?
> 0x172218 <+104>: cmp r12, #0 // r12 == success?
> 0x17221c <+108>: bne 0x17220c // loop back if not success - it would crash if suspended here
> ...
> 0x172228 <+120>: mov r1, lr // r1 = lr, we would scan r1, but it's too late.
The issue was we wouldn't scan the LR register as it's at word 15 of MonoContext, but on ARM we would only scan 14 words. We would then not pin, or even mark the obj which has just been CAS. We would have garbage in LR reg as it wouldn't point to the appropriate object anymore (potentially garbage).
The time taken by scanning here is not relevant, as it's the location of the suspend that had an influence. So scanning more regs will not change anything, and even performance wise, we now scan 34 words, instead of 14, so that will not change anything substantial.1853477tj2016-02-02 18:38:23 +0000Ah, I see. The special named register is not so special here :). Now, I can see why this was so tricky to track down. I am hopeful this might fix some other random crash type issues that I see from the field, but never was able to get a concrete repro case for, since it seems it would affect many things including concurrent collection types. Excited to try it out!1896198Rajneeshk2016-03-01 14:06:28 +0000I have checked this issue and, I am successfully able to build and run attached test sample provided in Bug description.
[Build Info: Master]
XamarinStudio-6.1.0.54_233e0422ff49ca09152bf44dc3c7b246e138a13e
monotouch-9.9.0.12_10607332fe620552a853075523b151b2d5b188cd
MonoFramework-MDK-4.5.0.17.macos10.xamarin.universal_c8ed3d4e7be69aa2cf6e95570983d40ddb29fa52
Screencast: http://www.screencast.com/t/I8gWAmMRtKh
This issue has been fixed in master, I will Re-Verify this issue when fix will merged in release branch.
Thanks..!
Device info: iPhone 4s iOS Version 9.2(13C75)
Environment Info: https://gist.github.com/Rajneesh360Logica/323f4278de7f7ea50229
Application Output: https://gist.github.com/Rajneesh360Logica/8e7ba9298b44c72314a21900289Rajneeshk2016-03-03 13:47:41 +0000I have checked this issue with the following build from C6SR2:
MonoFramework-MDK-4.3.2.485.macos10.xamarin.universal_eb15c95a294b05dd54a836596957ac275ffccf20
I observed that this issue is working fine with this build. I am successfully able to build and run attached test sample provided in Bug 38012 description.
Screencast: http://www.screencast.com/t/ySisudGS
Device info: iPhone 4s iOS Version 9.2(13C75)
This issue has been fixed, hence I am closing this issue.
Thanks..!