Hi Joe - you can just sit back and relax :) Your patch will go it's way
through pu, then opu and finally into the development branch for R14B04.
Regards
/siri
2011/9/14 Joe Williams <>
> Siri,
>> Anything else need to be done to graduate/merge this patch?
>> Thanks.
> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Tuesday, September 6, 2011 at 8:51 AM, Joe Williams wrote:
>> Great, let me know if any other changes are needed.
>> Thanks!
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Tuesday, September 6, 2011 at 12:45 AM, Siri Hansen wrote:
>> Hi Joe - looks good now! :)
> Thanks for your effort!
> /siri
>> 2011/9/5 Joe Williams <>
>> Siri,
>> Please refetch and let me know if the test works for you now.
>> Thanks!
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Monday, September 5, 2011 at 2:32 AM, Siri Hansen wrote:
>> Hi Joe - had a look at your changes... and it turned out to be quite
> interesting...
>> First, I saw that you hadn't updated the test suite with the new exit
> reason - but when I executed the test it did not fail... :o
>> It turned out that you catched the call to ?t:fail/1 - which actually only
> does an exit... I assume you meant "case catch" and not "catch case"...
> However, catching this is not at all necessary here, since it will be
> catched by the rpc:call anyway - it will return {badrpc, {'EXIT',
> {suspended_supervisor, _}}} .
>> Also, two other comments on the test case:
>> * when the call to get_supervised_procs return something unexpected -
> please print the return value in the test log (using e.g. ct:log/2) in order
> to make trouble shooting easier
> * stopping the node could be moved outside of the test case, in order to
> avoid hanging nodes if the test fails. This you can do by adding something
> like
>> supervisor_which_children_timeout(cleanup, Conf) ->
> stop_node(node_name(supervisor_which_children_timeout)).
>> This will then be called during end_per_testcase. I see that there are
> other test cases that don't do this, and I will clean up that before the
> next release.
>> Regards
> /siri
>>> 2011/9/1 Joe Williams <>
>> Seems good to me, I went ahead and made the changes.
>> Please refetch and let me know if there is anything else.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Thursday, September 1, 2011 at 12:59 AM, Siri Hansen wrote:
>> Thanks Joe! This looks very good. I have two minor comments only:
>> 1) Maybe you could change the first line of the commit message to say
>> General improvements to release_handler_1:get_supervised_procs
>> or
>> General improvements to get_supervised_procs in release_handler_1
>> 2) And when I looked at the tests it occured to me that it might be a good
> idea to change the error reasons to something more descriptive than
> 'release_handler_error'. What do you think? I know that there will be error
> messages in the shell, but I find it better if even the returned error
> reason gives a hint about what failed. My suggestions for the three calls to
> error/1 respectively (please change if you have better ideas):
>> suspended_supervisor
> {which_children_failed,Other}
> {get_modules_failed,Other}
>> Please let me know what you think about this.
>> Regards
> /siri
>> 2011/9/1 Joe Williams <>
>> Siri,
>> I have added tests and squashed my commits, rewriting the commit message
> while I was at it. Please refetch and let me know if this is up to your
> standards for graduation.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Tuesday, August 30, 2011 at 11:41 PM, Siri Hansen wrote:
>> Hi Joe - Yes, you can put your application inside
> release_handler_SUITE_data/lib. And hopefully you can use the
> functions create_and_install_fake_first_release
> and create_fake_upgrade_release - you can see examples of that in
> release_handler_SUITE.erl. Please let me know if you need any more advice on
> the testing.
> Regards
> /siri
>>> 2011/8/30 Joe Williams <>
>> Siri,
>> I could use some advice on how best to test this code. Currently I have a
> little dummy application with the proper supervision tree to trigger this
> upgrade case. Here are the steps I use to reproduce:
>>https://gist.github.com/da109fb6939ef7aac031>> Note that this is using my topic branch so you can see the additional
> logging.
>> I noticed that there is a release_handler_SUITE_data directory in the sasl
> tests, should I just add another directory specifically for this and store
> the dummy application there for testing?
>> Thanks.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Tuesday, August 30, 2011 at 1:44 AM, Siri Hansen wrote:
>> Great Joe - this is much better :)
> Could you please add some tests for this also?
> Thanks
> /siri
>> 2011/8/29 Joe Williams <>
>> Siri,
>> Please fetch this branch again.
>> I have added errors where I had my functions returning empty lists. I
> believe this should bubble up to release_handler causing a restart similar
> to the timeout behavior we currently have.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Monday, August 29, 2011 at 9:35 AM, Joe Williams wrote:
>> Siri,
>> In case #2 the node would be in an "unpacked" state but perhaps that isn't
> possible since the upgrade may be partially installed already. I'll work on
> implementing #1 and reply back soon.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Monday, August 29, 2011 at 7:31 AM, Siri Hansen wrote:
>> Hi Joe -
> I think I would prefer solution 1), although that's probably mostly because
> I don't really understand solution 2)... What do you mean by "stop the
> upgrade from completing"? in which state would the node be after this?
> /siri
>> 2011/8/26 Joe Williams <>
>> Siri,
>> That sounds correct, with the current patch there is that risk. In my case
> I would see the error message post-upgrade and restart things as needed but
> I certainly see your point. The VM restarting is a brutal but idiomatic way
> to deal with this issue, let it fail :).
>> I think there are two possibilities here, 1) continue with the restart
> behavior but make sure we print error messages before we do or 2) print
> error messages but stop the upgrade from completing if we catch the bad
> case. Thoughts?
>> -Joe
>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Friday, August 26, 2011 at 1:08 AM, Siri Hansen wrote:
>> Hi again, Joe!
>> Sorry for being so slow - but I still don't really understand :(
> My concerns are really about whether or not we should allow the upgrade to
> be performed in this case. For sure I think we should
>> 1) avoid the timeout, and
> 2) let the user know what the problem is
>> but is it correct to let the upgrade pass after this? Is it not an error
> situation?
>> It seems to me that we risk getting into a situation where we believe
> that the system is upgraded, but in fact there could be branches of the
> supervisor tree where process have not had the chance to run their
> code_change functions. I mean - even if we print the error report, there is
> no guarantee that it is really detected unless the operation actually fails.
>> Please correct me if I completely misunderstood the situation.
>> Regards
> /siri
>>> 2011/8/25 Joe Williams <>
>> Siri,
>> I ran into two issues that this patch addresses. Check out the commit
> message at
>https://github.com/joewilliams/otp/commit/9c3a53789326cdd929f1c3b4525716b1c0abfe87 for
> the details. In both cases I found that in production an error in the logs
> was preferable to the restart of the VM since both are easily fixable with a
> small application change or in the case of the suspended supervisor using a
> different app up. Also see this comment in release_handler_1 regarding the
> supervisor,
>https://github.com/erlang/otp/blob/dev/lib/sasl/src/release_handler_1.erl#L454 which
> suggests this corner case is known by at least a few people. Currently there
> is no way to know *why* your VM just restarted after the upgrade in either
> case.
>> Let me know if you have any other questions.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Thursday, August 25, 2011 at 6:35 AM, Siri Hansen wrote:
>> Hi again, Joe!
>> Could you please explain a bit about the situation where you discovered
> this problem? I agree that the timeout and VM restart is not very good, and
> it makes sense to check if the supervisor is suspended. But I'm not really
> sure if it is correct to allow the upgrade to continue when this error
> occurs. Even if an error message is printed, I guess it could be quite easy
> to miss this fact... and the question is if that would be a problem or not?
> Why is the supervisor suspended in the first place?
>> Regards
> /siri
>>> 2011/8/25 Siri Hansen <>
>> Hi Joe - I've just started looking at this. Do you think it would be
> possible to add a test case for it?
> Regards
> /siri
>>> 2011/8/24 Joe Williams <>
>> Anything I can do regarding this patch? I have happily been running it in
> production since I submitted it to the list in June.
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Wednesday, July 6, 2011 at 3:43 PM, Joe Williams wrote:
>> Anything I can do to help this patch graduate?
>> Thanks!
>> -Joe
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Tuesday, June 14, 2011 at 12:26 PM, Joe Williams wrote:
>> Updated this branch, please refetch.
>> git fetch git://github.com/joewilliams/otp.git release_handler_1
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Friday, June 10, 2011 at 8:52 AM, Joe Williams wrote:
>> Great, thanks!
>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>> On Friday, June 10, 2011 at 8:51 AM, Raimo Niskanen wrote:
>> On Thu, Jun 09, 2011 at 08:20:51AM -0700, Joe Williams wrote:
>> Please fetch:
>> git fetch git://github.com/joewilliams/otp.git release_handler_1
>> This is a different branch with a better commit message and no white space
> changes.
>>> Excellent. I will include your patch in 'pu' after rewording the
> summary line to imperative form.
>>>>> --
> Name: Joseph A. Williams
> Email: > Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>>> On Thursday, June 9, 2011 at 7:44 AM, Joe Williams wrote:
>> Nothing specific, just wondered if anyone had any thoughts on how I dealt
> with a couple of corner cases in installing releases.
>> I'll fix things up and get back shortly.
>> --
> Name: Joseph A. Williams
> Email: (mailto: <>)
> Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>>> On Thursday, June 9, 2011 at 12:11 AM, Raimo Niskanen wrote:
>> On Wed, Jun 08, 2011 at 03:41:37PM -0700, Joe Williams wrote:
>> Any thoughts/feedback on this patch? I realize it doesn't follow the
> guidelines (https://github.com/erlang/otp/wiki/Submitting-patches) exactly
> and will clean it up soon.
>>> Anything in particular? I just got caught up in tideous merge work
> yesterday and missed to include your patch in 'pu', I was about
> to take it now.
>> But if you have a cleanup I can wait for it...
>>>> --
> Name: Joseph A. Williams
> Email: (mailto: <>)
> Blog: http://www.joeandmotorboat.com/> Twitter: http://twitter.com/williamsjoe>>> On Tuesday, June 7, 2011 at 2:33 PM, Joe Williams wrote:
>> git fetch git://github.com/joewilliams/otp.git (
>http://github.com/joewilliams/otp.git) (
>http://github.com/joewilliams/otp.git) release_handler
>>> _______________________________________________
> erlang-patches mailing list
> (mailto:<>
> )
>http://erlang.org/mailman/listinfo/erlang-patches>>>> --
>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>> _______________________________________________
> erlang-patches mailing list
> (mailto:<>
> )
>http://erlang.org/mailman/listinfo/erlang-patches>>> --
>> / Raimo Niskanen, Erlang/OTP, Ericsson AB
>>>>>>> _______________________________________________
> erlang-patches mailing list
>>http://erlang.org/mailman/listinfo/erlang-patches>>>>>>>>> _______________________________________________
> erlang-patches mailing list
>>http://erlang.org/mailman/listinfo/erlang-patches>>>>>>>>>>>>>>-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20110914/62ab0e71/attachment-0001.html>