- I don’t update {{seen}} link because transient set can be mutated in-place
- Are transducers meant to be used from multiple threads? Because even existing implementation clearly has race condition. I imagine fixing that would be costly (we’ll need a synchronized section), so maybe it should be a specialized transducer that you use only when needed?

I should also say transducers are not expected to be used from more than one thread at a time, so there are no race problems. But being used from multiple threads over time requires proper safe publication.

bq. But being used from multiple threads over time requires proper safe publication.

Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?

Does that also mean that {{partition-by}} and {{partition-all}} should be fixed (they use {{java.util.ArrayList}} which, being array of references, has no safe publication semantics)?

bq. Transient sets can NOT be mutated in place - you must use the return value.

I was thinking that {{clojure/core.clj}} and {{clojure.lang.ATransientSet.java}} are both part of Clojure internals, colocated, so can share a little bit of internal knowledge about each other. It seems safe to do that, because that knowledge does not leak outside, and, if at any point impl of ATransientSet would change, core.clj could be updated accordingly in the same release. I wouldn’t do that in any third-party library, of course.

{quote}Does that imply that no transients could be used in transducers (because underlying arrays on which transient impl is based are mutated in place, so different threads could potentially see different states of transient object)?{quote}

Transients require only that they are asked by no more than a single thread at a time and so are safe to use in a transducer. However, they should guarantee safe publication. core.async channels already do this as an artifact of their implementation, but other transducing contexts may not.

Transients should NEVER be used as "mutate in place", regardless of concurrency. While they will appear to "work" in some circumstances, this is never correct (eventually an update operation will return a new instance and if you are mutating in place, your data will then be missing). This is discussed and correct examples are shown at http://clojure.org/reference/transients.

{quote}Does that also mean that partition-by and partition-all should be fixed (they use java.util.ArrayList which, being array of references, has no safe publication semantics)?{quote}

Maybe that doc http://clojure.org/reference/transients should be updated re: transients are not safe to use from multiple threads because changes made by one thread are not necessarily visible to another. Even if they don’t compete

I would say that test is demonstrating a bug in transient sets/maps and you should file a ticket for that as it's a lot more important than this enhancement.

distinct should be able to use transients in both the transducer and lazy seq impls. The issue with contains? not working on transients is actually a separate ticket - http://dev.clojure.org/jira/browse/CLJ-700 that will likely require some class hierarchy rearrangement. I don't think we would take this change until that is fixed (so that you can avoid relying on the class and Java method variants).

I have to admit my test was demonstrating something else: there were no proper thread isolation. So it was a concurrency issue, not “safe publication” issue. My current understanding is this:

bq. Transients require thread isolation. Use of a particular transient instance should be controlled either by using it in an single-threaded scope, or in a framework that enforces this.

That guarantee implicitly presumes that there’s happens-before relation between transient usage from multiple threads. There’s no other way to define “only one thread is in this section at a time”.

That, in turn, means that all writes that happened in thread 1 are visible in thread 2, regardless to volatility of the variables involved. In fact, we can remove all volatiles from transients implementation and probably make them faster, because, by asking “no more than one thread at a time” we enforce users to establish happens-before between sections, and that would give us all the safe publication guarantees we need.

Also, long-living transients (e.g. in a transducers associated with a queue, for example) will hold a reference to a thread that created them. Is that a bad thing? Should we switch to boolean flag instead?