I never said it was the solution to all collisions but it will cover
most of the (very rare and never reported) overwrites at 0 cost.
> Firstly, it can show collisions only after a document is already
> overwritten, thus the damage is already done.
Again keep in mind that most of the code does:
* getDocument(DocumentReference)
* modify the XWikiDocument instance
* saveDocument()
so if getDocument() fail you are not going to overwrite anything.
> Secondly, loadXWikiDoc has to
> be called for the document which doesn't exist anymore, I guess this
> doesn't happen so often since the system won't list it anymore.
Not sure which use case you are referring to here. Are you talking
about document deleted the a document with a different reference but
same hash is saved ?
>
> On 2 February 2018 at 13:36, Thomas Mortagne <thomas.morta...@xwiki.com>
> wrote:
>
>> For document what could help a lot already without any performance
>> penalty is to compare the loaded document reference and the passed one
>> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
>> in XWiki apply the following logic: getDocument(), modify it,
>> saveDocument().
>>
>> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <marc.sla...@synventis.com>
>> wrote:
>> > Hi Denis,
>> >
>> > Thanks a lot for your answer. I know it's been a while, but I'd still
>> like
>> > to follow up on it since it's quite the fundamental issue.
>> >
>> >> Therefore, by improving the hash algorithm, the size of the ids, and the
>> > quality of the hashed key, we have considered ourselves to be saved
>> enough
>> > for a normal usage.
>> >
>> > Still, with enough bad luck, documents and objects may be overwritten
>> > without a trace. This is not a stable implementation. And even worse, if
>> on
>> > any XWiki installation hash collisions will happen in the future (or have
>> > already happened since 4.x), they probably won't be easily associated
>> with
>> > this issue because it's nearly impossible to debug.
>> >
>> > While I do now understand the motivation to stick with hashes, I'm still
>> > not sure why a collision detection would be difficult to introduce and
>> why
>> > it's even "impossible for some API". Let me briefly outline an idea:
>> >
>> > In XWikiHibernateStore#saveXWikiDoc on L615
>> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
>> > an exists check on the doc id is already performed. If now
>> > xwikidoc.fullName is also selected in the HQL, a comparison to
>> > doc.getDocumentReference() can expose an imminent collision before data
>> is
>> > overwritten. At least an XWikiException should be thrown in this case. A
>> > similar thing could be done before saving BaseObjects on L1203
>> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
>> > to avoid collisions on Object IDs.
>> >
>> > I don't think a change like this would be difficult to implement, I could
>> > provide a PR of that sort. The performance penalty has to be tested for
>> > your systems though, since the full name isn't indexed afaik.
>> >
>> > Regards
>> >
>> > Marc Sladek
>> > synventis gmbh
>> >
>> > On 30 November 2017 at 15:21, Denis Gervalle <d...@softec.lu> wrote:
>> >
>> >> Hi Marc,
>> >>
>> >> Here are some answers:
>> >>
>> >> 1) MD5 was already a dependency of our oldcore and using SHA1 would have
>> >> added a dependency without bringing much benefit. Since we only used 64
>> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a better
>> >> distribution.
>> >>
>> >> 2) Such a collision detection is difficult to be introduced in the
>> >> existing code base, for some API it is even impossible. What you
>> experience
>> >> with the 32-bit ids had been my motivation to the changes in 4.x and I
>> >> could say, based on my long XWiki experience, that even with the poor
>> >> 32 bit ids, very few users had been affected. Therefore, by improving
>> the
>> >> hash algorithm, the size of the ids, and the quality of the hashed key,
>> we
>> >> have considered ourselves to be saved enough for a normal usage.
>> >>
>> >> 3) That’s the worst point. I cannot answer about the first decision, I
>> >> wasn’t yet involve, but regarding the changes introduced in 4.0, a
>> change
>> >> had been considered. The ids are only there to satisfy Hibernate and its
>> >> loading mechanism. If we had used a counter, we had to manage a
>> conversion
>> >> table between ids and entity references with all the additional
>> complexity
>> >> (consistency issues, caching, ...). This is so because we use entity
>> >> reference to point directly to document (or even objects) everywhere in
>> >> XWiki. This would have been a huge work to introduce that behaviour and
>> at
>> >> the same time keeping all the existing API unchanged. It would probably
>> >> have introduced a performance penalty as well. This is why we resigned
>> and
>> >> go for an improved hash solution. IMO, if we had to make such a change,
>> we
>> >> are even better rewriting the storage service completely, and even stop
>> >> using Hibernate, which, to be honest, does not bring much benefit to
>> >> XWiki with its ORM aspects.
>> >>
>> >> But if you really want the complete answers, you can look at those
>> threads:
>> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>> >>
>> >> Regards,
>> >>
>> >> --
>> >> Denis Gervalle
>> >> SOFTEC sa - CEO
>> >>
>> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <marc.sla...@synventis.com>,
>> >> wrote:
>> >> > Dear XWiki devs
>> >> >
>> >> > We are using the XWiki platform for our applications but sadly are
>> still
>> >> > stuck with 2.7.2. Lately we ran into issues on a large database and
>> >> noticed
>> >> > "disappearing" BaseObjects. We were able to link it to XWIKI-6990
>> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
>> collided
>> >> > (hash collisions) and overwrote other objects without any trace -
>> neither
>> >> > visible in the history nor in a log file.
>> >> >
>> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
>> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> xpn/xwiki/doc/XWikiDocument.java#L841
>> >> > and BaseElement
>> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> xpn/xwiki/objects/BaseElement.java#L237
>> >> > and
>> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5, which
>> >> > makes a collision less likely. I have a few questions regarding your
>> >> > solution:
>> >> >
>> >> > 1) Is there any specific reason why you have chosen MD5 over SHA-1 or
>> 2?
>> >> >
>> >> > 2) Collisions are still possible and would be extremely hard to notice
>> >> > since they are completely silent. Have you considered to implement a
>> >> > collision detection to at least log occurring collisions - or even
>> better
>> >> > reserve 1-2bits of the 64bit to be used as collision counter in the
>> case
>> >> of
>> >> > it happening?
>> >> >
>> >> > 3) To question the concept of generating a hash for an ID in general:
>> >> > Wouldn't a database defined "auto increment" be a much more robust
>> >> solution
>> >> > for the hibernate IDs? A collision would be impossible and
>> >> > clustering/scalability is still possible with e.g. the InnoDB
>> >> “interleaved”
>> >> > autoincrement lock mode. Why have you chosen a hash based solution in
>> the
>> >> > first place?
>> >> >
>> >> > I'm sorry if these questions were already answered in the dev mailing
>> >> list
>> >> > or on issues, please link me to them since I couldn't find any
>> concrete
>> >> > answers.
>> >> >
>> >> > Thanks for your time and regards
>> >> >
>> >> > Marc Sladek
>> >> > synventis gmbh
>> >>
>>
>>
>>
>> --
>> Thomas Mortagne
>>
--
Thomas Mortagne