Conversation

This is in preparation for the upcoming flux-sched PR
and requires @grondo's job.submit change that is available
at wreck-experimental.

The new wreck.state.submitted event will be piggybacked with
job request info such as the number of nodes and walltime and
the scheduler will make use of this front-loaded information to
cut down on KVS accesses.

This also removes the null to null job transition code path
which is legacy code to break a race condition way
back when jsc was using KVS watch for job state monitoring.

This comment has been minimized.

Well... I copied the latest from that commit but it seems there are previous commits that I should have cherry-picked as well. At this point, I think it would be better, if @grondo, you can do this... I'm afraid I will lose some commit history.

This comment has been minimized.

I can use PerfExplore and compare the performance between two versions.

Not critical I was just wondering if there was something akin to soak.sh for use with flux-sched, or if perhaps it would be useful to adapt soak.sh for a haha test for PRs that might affect scheduler and core performance.

The 'reserved' state is meant only for a reserved KVS directory
for a job which has not yet been submitted or run (i.e. reserved
for wreck as writer). In the case of jobs submitted via flux-submit
this state is unecessary, so remove the initial reserved state for
submitted jobs, and the corresponding duplicated code that was
a result.

Add support for the new wreck.state.submitted event with which
job request info such as nnodes and walltime is piggybacked.
Schedulers can use this augmented information to reduce
KVS accesses to fetch job request information for performance
optimization.
Elliminate null->null transition code path, a legacy code
to deal with a race condition when JSC was using KVS
watch for monitoring state changes.

This comment has been minimized.

I think if we merge this now it will break the current flux-sched, so we'll need to wait until flux-framework/flux-sched#295 is ready so they can go in together. At least I think this is the case, @dongahn or @SteVwonder, please advise if otherwise.

This comment has been minimized.

You might try rebasing on @dongahn's branch now, then it will be a kind of noop to rebase on new master once this is merged (sorry if you've already done this). Hopefully the flux-sched PR won't require any more than trivial changes to this PR.

This comment has been minimized.

I think if we merge this now it will break the current flux-sched, so we'll need to wait until flux-framework/flux-sched#295 is ready so they can go in together. At least I think this is the case, @dongahn or @SteVwonder, please advise if otherwise.

I sort of want both PRs to go in as soon as possible given its needs for Splash. I think the only issue with the current sched PR is on the emulator code which @trws won't use. Maybe we can merge the sched PR as is and fix the emulator problem later. This will also help me to do another PR for the lightweight R. @SteVwonder?

This comment has been minimized.

@dongahn, given @trws needs for splash I also think we should get this in ASAP. If it ok to merge this let's let @garlick push the button.

An alternative would be to branch off flux-core and flux-sched/master with a splash branch where we can make more experimental and gratuitous changes, then merge back to master the salvageable code when splash firedrill is over.

This comment has been minimized.

It doesn't seem wrong to push master forward for this, given that the exec system will be replaced and that will require this sched/exec interface to be overhauled anyway. I'll push the button in a few minutes if there are no immediate objections.

This comment has been minimized.

edited

I am fine with pushing these through and having the emulator temporarily broken. I can look at flux-framework/flux-sched#295 now and see what is going on. Hopefully, I can put together a PR by the end of the day.

This comment has been minimized.

On 28 Mar 2018, at 11:01, Mark Grondona wrote:
@dongahn, given @trws needs for splash I also think we should get this
in ASAP. If it ok to merge this let's let @garlick push the button.
An alternative would be to branch off flux-core and flux-sched/master
with a `splash` branch where we can make more experimental and
gratuitous changes, then merge back to master the salvageable code
when splash firedrill is over.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1389 (comment)

Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.