Comments

Hi!
This patch adds depend clause support.
In GOMP_task, before queueing the task, if task has any depend clauses
we look up the addresses in a hash table (in the parent task, because
only sibling tasks are ordered through depend clause), and if there
are any dependencies on the earlier started tasks, the new task
isn't added to team, parent and taskgroup task queues, but instead
just added into the earlier task's depender vectors. Each task
has also an integer number of how many other tasks it depends on.
When a task on which something depends on finishes, if parent exists,
it's records are removed from parent's depend address hash table,
and even if parent doesn't exist anymore, we decrement num_dependees
of every task mentioned in the dependers vector. If any of those
counters go to zero, we insert them into all the relevant task queues.
Tested on x86_64-linux. Will commit tomorrow unless somebody complains,
but in any case would appreciate review of the changes.
2013-09-26 Jakub Jelinek <jakub@redhat.com>
* libgomp.h: Include stdlib.h.
(struct gomp_task_depend_entry): New type.
(struct gomp_task): Add dependers, depend_hash, depend_count,
num_dependees and depend fields.
(struct gomp_taskgroup): Add num_children field.
(gomp_finish_task): Free depend_hash if non-NULL.
* libgomp_g.h (GOMP_task): Add depend argument.
* hashtab.h: New file.
* task.c: Include hashtab.h.
(hash_entry_type): New typedef.
(htab_alloc, htab_free, htab_hash, htab_eq): New inlines.
(gomp_init_task): Clear dependers, depend_hash and depend_count
fields.
(GOMP_task): Add depend argument, handle depend clauses. Increment
num_children field in taskgroup.
(gomp_task_run_pre): Don't increment task_running_count here,
nor clear task_pending bit.
(gomp_task_run_post_handle_depend_hash,
gomp_task_run_post_handle_dependers,
gomp_task_run_post_handle_depend): New functions.
(gomp_task_run_post_remove_parent): Clear in_taskwait before
signalling corresponding semaphore.
(gomp_task_run_post_remove_taskgroup): Decrement num_children
field and make the decrement to 0 MEMMODEL_RELEASE operation,
rather than storing NULL to taskgroup->children. Clear
in_taskgroup_wait before signalling corresponding semaphore.
(gomp_barrier_handle_tasks): Move task_running_count increment
and task_pending bit clearing here. Call
gomp_task_run_post_handle_depend. If more than one new tasks
have been queued, wake other threads if needed.
(GOMP_taskwait): Call gomp_task_run_post_handle_depend. If more
than one new tasks have been queued, wake other threads if needed.
After waiting on taskwait_sem, enter critical section again.
(GOMP_taskgroup_start): Initialize num_children field.
(GOMP_taskgroup_end): Check num_children instead of children
before critical section. If children is NULL, but num_children
is non-zero, wait on taskgroup_sem. Call
gomp_task_run_post_handle_depend. If more than one new tasks have
been queued, wake other threads if needed. After waiting on
taskgroup_sem, enter critical section again.
* testsuite/libgomp.c/depend-1.c: New test.
* testsuite/libgomp.c/depend-2.c: New test.
Jakub

On 09/26/2013 11:36 AM, Jakub Jelinek wrote:
> +struct gomp_task;> struct gomp_taskgroup;> +struct htab;> +> +struct gomp_task_depend_entry> +{> + void *addr;> + struct gomp_task_depend_entry *next;> + struct gomp_task_depend_entry *prev;> + struct gomp_task *task;> + bool is_in;> + bool redundant;> +};
I'm a bit confused about the combination of linked lists and reallocated
arrays. When did you decide to use what?
> + if ((flags & 8) && thr->task && thr->task->depend_hash)> + {> + struct gomp_task *parent = thr->task;> + struct gomp_task_depend_entry elem, *ent = NULL;> + size_t ndepend = (uintptr_t) depend[0];> + size_t nout = (uintptr_t) depend[1];> + size_t i;> + gomp_mutex_lock (&team->task_lock);> + for (i = 0; i < ndepend; i++)> + {> + elem.addr = depend[i + 2];> + ent = htab_find (parent->depend_hash, &elem);> + for (; ent; ent = ent->next)> + if (i >= nout && ent->is_in)> + continue;> + else> + break;
I wonder if we ought always defer tasks with dependencies and skip this lock
and search? Unless the taskgroup is truly weird, we *are* going to have
dependencies between the tasks. Dunno what exactly to do with final_tasks
that have unfulfilled dependencies...
I also think it would significantly clean up the code to declare a struct with
a variable tail for the depend argument. That would eliminate all of the
casting to uintptr_t and give names to the first two entries.
> + if (tsk->dependers == NULL)> + {> + tsk->dependers> + = gomp_malloc (8 * sizeof (struct gomp_task *));> + tsk->dependers[0]> + = (struct gomp_task *) (uintptr_t) 1;> + tsk->dependers[1]> + = (struct gomp_task *) (uintptr_t) (8 - 2);> + tsk->dependers[2] = task;> + task->num_dependees++;> + continue;
Another place for which a struct with variable tail would significantly clean
up the code. And here's where I wonder why you're using realloc'd arrays here
as opposed to another linked list?
Perhaps what we need are smaller linked list entries like
struct gomp_task_depend_node {
struct gomp_task *task;
struct gomp_task_depend_node *next;
struct gomp_task_depend_node *prev;
};
and a different hash table entry like
struct gomp_task_depend_head {
void *addr;
struct gomp_task_depend_node *outs;
struct gomp_task_depend_node *ins;
size_t n_ins;
};
If we scan the ndepend entries twice, we can find out how many nodes we need,
and allocate them with the task like you do now. Scanning the ndepends array
twice can be sped by only looking up the hash table entries once -- allocate a
local array of size ndepend entries and cache the lookups.
I'd say we don't need a count of the n_outs because all of them on the list
must be sequentially dependent. Thus any new task simply depends on the
previous task in the outs list. Thus imo it makes sense to have ins/outs point
to the tail of the list as opposed to the head.
Is is really worthwhile to detect redundant dependencies? It seems just as
easy to add multiple dependencies and let them just fall out naturally.
OTOH, perhaps you should just go ahead with this patch and we can evolve it
incrementally. I don't see anything technically wrong with it.
r~

On Thu, Sep 26, 2013 at 03:54:09PM -0700, Richard Henderson wrote:
> On 09/26/2013 11:36 AM, Jakub Jelinek wrote:> > +struct gomp_task;> > struct gomp_taskgroup;> > +struct htab;> > +> > +struct gomp_task_depend_entry> > +{> > + void *addr;> > + struct gomp_task_depend_entry *next;> > + struct gomp_task_depend_entry *prev;> > + struct gomp_task *task;> > + bool is_in;> > + bool redundant;> > +};> > I'm a bit confused about the combination of linked lists and reallocated> arrays. When did you decide to use what?
I initially wanted to use linked lists only, but while I can statically
preallocate the chains for the hash table, for the depender -> dependee
chains where a task may depend on many other tasks that would mean having to
allocate small structures (or pool allocate them, per team?).
> I wonder if we ought always defer tasks with dependencies and skip this lock> and search? Unless the taskgroup is truly weird, we *are* going to have> dependencies between the tasks. Dunno what exactly to do with final_tasks> that have unfulfilled dependencies...
I think final tasks aren't a problem, if the parent is a final task, then
all the children are non-deferred, thus we never record any dependencies
and the test for that will be cheap too (because parent->depend_hash will be
NULL). The problem is if (0) tasks, the spec says that they must be
non-deferred unless they depend on some earlier non-finished task. But
the cost in that case is primarily in taking the lock/unlock; the search
will stop on the first dependency found, if there aren't any, nothing will
be recorded and we don't jump to the defer label, if there are some, as soon
as we discover first we jump there.
> I also think it would significantly clean up the code to declare a struct with> a variable tail for the depend argument. That would eliminate all of the> casting to uintptr_t and give names to the first two entries.
I agree if we keep using realloced vectors that flexible array would make it
cleaner.
> Perhaps what we need are smaller linked list entries like> > struct gomp_task_depend_node {> struct gomp_task *task;> struct gomp_task_depend_node *next;> struct gomp_task_depend_node *prev;> };
The dependee -> depender vectors resp. linked lists are just pushed to first
(the only thing needed during insertion is to have a cheap check if the last
inserted task is the current one, to avoid having the same task multiple
times in the vector/linked list), and then just walked once when the
dependee finishes, so no removal is needed there, it can be freed at once;
thus, for linked list, it would be enough to use non-doubly linked list for
that. For the hash table chains, unless we want to always lookup the hash
table and walk the chains for removal, we need doubly linked list.
> > and a different hash table entry like> > struct gomp_task_depend_head {> void *addr;> struct gomp_task_depend_node *outs;> struct gomp_task_depend_node *ins;> size_t n_ins;> };
You mean that the hash table instead would contain the structures, or
pointers to these structures? If the latter (not sure what n_ins would be
for), then we'd again need to pool alloc them.
> > If we scan the ndepend entries twice, we can find out how many nodes we need,> and allocate them with the task like you do now. Scanning the ndepends array> twice can be sped by only looking up the hash table entries once -- allocate a> local array of size ndepend entries and cache the lookups.> > I'd say we don't need a count of the n_outs because all of them on the list> must be sequentially dependent. Thus any new task simply depends on the> previous task in the outs list. Thus imo it makes sense to have ins/outs point> to the tail of the list as opposed to the head.
Ah, haven't thought about it this way, yes, you're right that for out/inout
dependencies it is enough to remember in the hash table the last one,
because the dependencies will form a chain on the same address and serialize
the tasks.
> Is is really worthwhile to detect redundant dependencies? It seems just as> easy to add multiple dependencies and let them just fall out naturally.
I just didn't want to have duplicates in the hash table chains, the
redundant flag is just a sign that the entry doesn't need to be removed
from the hash table chains.
> OTOH, perhaps you should just go ahead with this patch and we can evolve it> incrementally. I don't see anything technically wrong with it.
Perhaps. What if I do just minor cleanup (use flexible array members for
the reallocated vectors, and perhaps keep only the last out/inout task
in the hash table chains rather than all of them), retest, commit and then
we can discuss/incrementally improve it?
Jakub