We need a thread-safe interface to mutable state, for use in library code that does not otherwise use concurrency.
We have two choices:

Use MVars. A non-concurrent implementation might implement them in terms of IORef, for example.

Use STM. Easier to use, but not entirely trivial to implement, even in a single-threaded implementation, because exceptions have to abort a transaction (​sample implementation).

Concurrent foreign calls are required. A concurrent foreign call allows other Haskell threads to make progress before the foreign call returns.
Rationale:

concurrent foreign calls are required to guarantee progress of other Haskell threads when one thread makes a blocking call.

concurrent foreign calls are required for implementing I/O multiplexing, a principal use of concurrency.

concurrent foreign calls are required to guarantee timely responsiveness of an interactive application in the presence of long-running foreign calls.

Concurrent/reentrant foreign calls are required. A reentrant foreign call is a foreign call that calls a foreign-exported Haskell function. A concurent/reentrant foreign call is both concurrent and reentrant.
Hence, the
Haskell system must be able to process call-ins from arbitrary
external OS threads.
Rationale:

the main loop of a GUI may block (hence concurrent) and makes callbacks (hence reentrant), we need to support this kind of usage.

providing concurrent/reentrant foreign calls does not impose significant extra overhead on the rest of the system. For example, a call-in can check a thread-local variable (fast) to see whether it arose from a foreign call.

Foreign calls will be able to specify independently whether they
are concurrent, reentrant, or both. (syntax and the sense of the annotations are still to be decided, see below).
Rationale:

these annotations can have a profound impact on performance in some implementations.

Bound threads are not required, but allowed as an extension, and we
will specify their meaning.

Outstanding issues

There follows a list of issues on which decisions are still to be
made. A numbered item, eg. 1.2, indicates a question, an item ending
in a letter, eg. 1.2.a, indicates a possible choice for question 1.2.

1. Cooperative or preemptive concurrency?

Choice 1.a. The spec requires cooperative concurrency, and preemption is
allowed as an extension. Both would be specified precisely in
terms of what progress and fairness guarantees the programmer can
expect.

Pros

Allows many more implementations, including Hugs (although Hugs
needs to be updated to handle concurrent and
concurrent/reentrant foreign calls, and non-blocking I/O).

Preemption isn't always required; a common case is an
application that relies on concurrency for I/O multiplexing,
where most threads are usually blocked.

Cooperative systems can be faster, and are simpler to implement
(see state threads reference).

Cons

Portability problems: if a programmer develops a concurrent
application on a preemptive system, there is no guarantee that
it will work as expected on a cooperative system, and the
compiler/runtime can give no useful feedback.

Need to specify which operations are "yield points" in library
documentation.

Long-running pure code must be refactored into the IO monad so
that explicit yield points can be inserted.

Choice 1.b. Preemption is required by the spec.

Pros

Simpler from the programmer's point of view: no yield, no
worrying about latency. "write code as if the current
thread is the only one.".

Cons

Imposes significant implementation constraints. Essentially
only GHC and YHC would be able to implement it. JHC has no
concept of thunks, which is a barrier to implementing general
preemption.

Even in a preemptive system, deadlocks are easy to program, and
arbitrary starvation can result from laziness: evaluating
arbitrary expressions while holding an MVar can prevent other
threads from running. The fact that we therefore require seq
and possibly deepSeq is disturbing, as is the notion that the
programmer must think about "what is evaluated" when
programming concurrent code.

1.b.1. Include thread priorities or not?

Pros

Some applications require it

It affects the fairness/progress guarantees, including the
possibility of priorities from the outset may be simpler.

Cons

Hard to implement, no implementations yet.

2. Syntax for foreign call annoatations.

2.1. choices for concurrent calls:

Choice 2.1.a. we annotate concurrent calls:

a. concurrentb. mayblockc. mightblockd. blockse. longrunning

Rationale for using the term "block": blocking is the main
reason for wanting concurrent calls. Concurrent calls allow
the progress guarantee to be retained in the presence of a
blocking foreign call. A foreign call that just takes a long
time is still making progress.

Rationale for not using the term "block": the fact that the
call blocks is immaterial, the property we want to provide is
that it doesn't impede progress of other Haskell threads. A
long-running call is indistinguishable from a blocked call in
terms of the progress of other threads.

We often don't know whether a library call will block or not
(it isn't documented), whereas saying a call should run
concurrently with other threads is a choice the programmer can
reasonably make.

Choice 2.1.b. we annotate non concurrent calls:

Rationale for annotating the non-concurrent calls: this is a
performance issue. It is always correct to make a concurrent
call, but it might be more efficient to make a non-concurrent
call if the call does not block. An implementation might
implement all calls as concurrent, for simplicity.

Against: John Meacham says "The FFI is inherently unsafe. We
do not need to coddle the programer who is writing raw FFI
code."

a. nonconcurrentb. noblockc. returnsquicklyd. faste. quick

2.2. choices for non-reentrant calls:

a. nonreentrantb. nocallback

Rationale for annotating the non-reentrant calls, as opposed
to the reentrant ones: we want the "safe" option to be the
default (as in the FFI spec).

2.3. should we annotate foreign calls according to whether they need
to access thread-local state (TLS) or not?

Pros

a call that doesn't need access to thread-local state, called from a bound thread, can be executed much more quickly on an implementation that doesn't run the Haskell thread directly on the bound OS thread, because it doesn't need to context switch.

Cons

libraries that require TLS, eg. OpenGL, often have many fast TLS-using functions. So implementations that need the no-TLS annotation in order to get good performance, will probably still get poor performance from libraries that need TLS anyway.

3. Semantics of IORefs

MVar operations must be strictly ordered; that is, a thread must never
observe MVar operations performed by another thread out of order.

We have a choice when it comes to IORefs, however. (Note that
this only affects true multiprocessor implementations of concurrent
Haskell).

Choice 3.a Specify a weak memory model, in which IORef updates
may be observed out of order, but specify that certain operations
(eg. MVar operations) constitute sequence points around which no
re-ordering may happen.

Choice 3.b Specify a strong memory model in which no re-ordering is
observable.

Pros

Some processors provide this anyway (current generations of x86, x86-64)

The implementation will require some synchronisation in any case in order to prevent threads from observing partially-written closures. For example, if one thread builds a closure and writes its address into an IORef, there must be a write barrier (and possibly a read barrier depending on the CPU) to prevent other threads from following the pointer and not finding the closure at the end of it. This synchronisation may be enough to provide the strong memory model anyway.

Strong memory models are easier to program with, and leave fewer possibilities for a program to behave unexpectedly on a different processor or Haskell implementation.

MVar Guarentees

Alternate, simpler proposal: full memory barrier at every putMVar and takeMVar.

perhaps a better phrasing of the first proposal exists, in practice, from a users point of view, it would be hard to tell the difference between the two models, but we should say something concrete on the matter.

Misc library stuff

yield is guarenteed to choose an alternate thread if another one exists and is
runnable.

sleep guarentees the thread will wait as long as its argument at a
minimum. it may be blocked for longer.

I/O

I/O operations from System.IO, System.Directory, System.Process (and others?) do not prevent other threads from making progress when they are waiting for I/O to complete.

We could provide a lower-level non-blocking I/O interface along the lines of threadWaitRead, threadWaitWrite, perhaps in Control.Concurent.IO.

Optional extensions to basic standard

These are optional extensions a compiler may implement. In some
implementations they may entail a run-time cost to non-concurrent code or a
compiler might need a special option to enable them. However, A compiler is
not required to provide more than one concurrency model as long as it can meet
the requirements of the standard and any options it claims to support.

If a compiler documents that it supports one of the following options, then it
must adhere to the rules of that option as well.

Optional Feature 1 - Preemption

The standard only requires a progress guarentee, that a thread is always
running, making progress. If an implementation supports context switching
during arbitrary computations and meets the stronger
fairness guarentee below, then it can be said to support the 'Preemption' option.

Fairness Guarentee

no starvation

new library calls provided

mergeIO, nmergeIO

Optional Feature 2 - OS threads

The implementation additionally allows the following:

foreign exported functions, and function pointers created by foreign import "wrapper",
can be invoked from multiple OS threads

Notes

Sharing

Although not mentioned in the standard, the use of Concurrency may affect
the lazy sharing of computations. Consult an implementations documentation if
this might be an issue for you.

unsafePerformIO

Using concurrent operations inside of an unsafePerformIO or unsafeInterleaveIO
may have unforseen consequences, check an implementations documentation for
details before depending on any particular behavior.