On 31/03/2017 13:21, Alex Bennée wrote:
> Anyway I think I'm getting closer to narrowing it down. On record I see
> replay_current_step jump backwards with this:
>
> /* A common event print, called when reading or saving an event */
> static void print_event(uint8_t event)
> {
> static int event_count;
> static uint64_t last_step;
> uint64_t this_step = replay_get_current_step();
>
> fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
> event_count, event, this_step);
>
> if (this_step < last_step) {
> fprintf(stderr,"%s: !!! step %d went backwards
> %#"PRIx64"=>%#"PRIx64"!!!\n",
> __func__, event_count, last_step, this_step);
> abort();
> }
> last_step = this_step;
> event_count++;
> }
>
> void replay_put_event(uint8_t event)
> {
> assert(event < EVENT_COUNT);
> replay_put_byte(event);
> print_event(event);
> }
>
> The jump back is fairly consistent in my case (event 66 in the vexpress
> run) but I'm fairly sure that is the bug. I can't see any reason for the
> step count to go backwards - indeed that breaks the premise of .
Good catch! I suspect it's the "else" case in cpu_get_icount_raw:
icount = timers_state.qemu_icount;
if (cpu) {
if (!cpu->can_do_io) {
fprintf(stderr, "Bad icount read\n");
exit(1);
}
icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
}
Between
timers_state.qemu_icount += count;
and
timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+ cpu->icount_extra);
the I/O thread can read a value that is later rolled back. I think you
need to do this the other way round: add something to icount in
cpu_get_icount_raw rather than taking it off:
icount = timers_state.qemu_icount;
if (cpu) {
if (!cpu->can_do_io) {
fprintf(stderr, "Bad icount read\n");
exit(1);
}
icount += cpu->pending_icount
- (cpu->icount_decr.u16.low + cpu->icount_extra);
}
where cpu->pending_icount is set before cpu_exec, and folded into
timers_state.qemu_icount afterwards.
Also, here:
if (use_icount) {
int64_t count;
int decr;
timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+ cpu->icount_extra);
cpu->icount_decr.u16.low = 0;
cpu->icount_extra = 0;
this should be dead code because tcg_cpu_exec also clears the two
decrementer fields. So if you can replace the three assignments with
assertions on cpu->icount_decr.u16.low and cpu->icount_extra, that would
also simplify the code and remove race opportunities.
Paolo
>>>>>> - replay puts back unexecuted instructions (qemu_icount -= counter)
>>>>>>
>>>>>> But virtual timer cannot take in account non-executed instructions
>>>>>> (counter) and
>>>>>> therefore it reads only qemu_icount, which provides incorrect time.
>>>>>>
>>>>>>> But the timers are all enabled:
>>>>>>>
>>>>>>> (gdb) qemu timers
>>>>>>> Processing Realtime timers
>>>>>>> clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808
>>>>>>> Processing Virtual timers
>>>>>>> clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808
>>>>>>> timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)
>>>>>>> timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)
>>>>>>> Processing Host timers
>>>>>>> clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000
>>>>>>> Processing Virtual RT timers
>>>>>>> clock QEMU_CLOCK_VIRTUAL_RT is enabled:true,
>>>>>>> last:-9223372036854775808
>>>>>>
>>>>>> Timers are processed only at checkpoints recorded in the log.
>>>>>> When replay is stuck, probably there is a pending checkpoint in the log
>>>>>> and pending instructions in CPU (because iothread breaks its
>>>>>> synchronization).
>>>>>
>>>>> How does this work for instructions that sleep (like WFI)? The WFI will
>>>>> cause us to exit the loop but I assume we need a "real" timer to trigger
>>>>> a firing of whatever the WFI is waiting for as time according to icount
>>>>> isn't advancing.
>>>>
>>>> This is virtual_rt for. It advances even when vCPU is stopped.
>>>
>>> So I'm not seeing this happen:
>>>
>>> process_icount_data: icount not increased 117302332749=>117302332749 RT:-1
>>>
>>> I wonder if the advance of virtual_rt has been deadlocked? The -1
>>> suggests its not even setup.
>>
>> qemu_start_warp_timer says that it is setup only when there are active
>> virtual timers.
>>
>>
>>>> By the way, I made small description of the replay log format
>>>> to be later included into docs/replay.txt:
>>>>
>>> Thanks. That's really useful.
>>>
>>> Do you think it would be worth including the dumper script I wrote in
>>> ./scripts/? I mainly wrote it to aid my understanding of the code flow
>>> but it might be useful in future investigations?
>>
>> Usually I get human-readable logs in record and replay modes to find
>> divergence between them, but I think your script might be useful in some
>> cases.
>
> Cool. I'll include it in the icount series once I have something to post
> (fingers crossed I'm getting close).
>
> --
> Alex Bennée
>