We encountered a core dump of ceph-osd. According to the following information from gdb, the problem was that the prior_version of PGLog entry 8336'960 was pointing to 8336'957 while the code expected for 8336'959.

According to the output of "ceph-objectstore-tool ... --skip-journal-replay --pgid 5.1 --op log", we knew that entries 8336'957, 8336'958, 8336'959 and 8336'960 were all available. But 8336'957 was MODIFY, 8336'958 and 8336'959 were were ERROR, 8336'960 were MODIFY.

An intuitive solution for me is that we execute "last = i->version" when "! i->is_error()".Or the consistency check (such as "i->prior_version == last") should be done against "entries" in 7879efdd6b14770b287c672641dc2461e491f9b0 instead of "orig_entries".

PS: to address the problem before the code change, we used ceph-objectstore-tool to remove the problematic PG (since we still had enough valid replicas) and started the OSD successfully.