Nilesh: well, you could also just fix the firewalling for your CI nodes. that might still be necessary after https://review.openstack.org/467480 is merged, depending on the details of your setup. you may also want to continue this discussion in #openstack-qa, as it is a devstack issue

pabelanger: i can't find anywhere stackalytics creates a dump of its data as a backup, so i'm just going to assume this is broken until the next gerrit restart (we're probably coming due for one anyway, last restart was 7 days ago)

clarkb: it's true that traceback doesn't have the image info, however, it does have the node id, and all nodepool log entries can be cross-referenced by that. so it's easy to get this: http://paste.openstack.org/show/610578/

clarkb, mordred: i don't think tracebacks are only for when "code" breaks. we spent years trying to figure out what underlying cloud problem was causing nodepool issues because of exception masking, so that's a dangerous idea to proceed from. i'm actually more okay with this particular patch (which seems like a pretty decent log line) than i am with the idea that we should not log tracebacks unless "code" is broken.

clarkb, mordred: to riff on mordred's commit message -- sometimes we have needed tracebacks to determine the root cause of data errors. if we're going to start masking them, i just want us to think very carefully about it and make sure we're not hobbling ourselves.

mordred, clarkb: if we're pretty sure there's no chance of a traceback adding extra information here, okay. i'm not sure we should mask the actual error we get from nova though. if we wanted to report this to a cloud operator, we might want to give the request id, etc, which is in the error string.

Shrews: pabelanger thinking about the failed uploads thing a bit more, I think keeping the last failure only should be sufficient? THat notifies the user that there is or was a failure and then they have to look at logs regardless which will show the true extent of the problem

I was also just thinking about logs while looking at this - it seems like at some point we should audit info and debug - I think we all tend to use the debug log to track down issues - but that's so chatty 'normal' errors like the image missing in osic are easy to miss

failures happen all the time, most of which aren't interesting. if you expect success, then looking through logs is a better way to assure yourself. or, in our case, generally looking at graphs with error rates is a better way to find out when it's worth looking into issues

mtreinish: i haven't forgotten about the health api frequency analysis, but these are the top 100 most requested paths according to the current log retention. i haven't had time to work on response size analysis yet since i've determined that i'll need to switch from a shell script to python to do that bit (for the sake of my own sanity)... http://paste.openstack.org/show/610590

mordred: i'd rather not encumber the specs.o.o site with expectations that random applications will rely on it being stable/present long-term. i think we should create a dedicated domain name we can use for this, which gives us a lot of flexibility about where and how we serve it in the future

pabelanger: if there are more propose-openstack-ansible-update-osa-test-scripts jobs currently queued, you need to restart again. I don't see a job in the queue, so we should be good - but please check again later.

jeblair: I'm looking at the etherpad for lists.o.o snapshotting and it doesn't list the services to disable (but not stop). Looking at ps output it appears that exim and mailman are the two I want ot disable, are there others?

mordred: the zypper package in ubuntu is fairly broken indeed. I had to spend quite some hoops to get diskimage-builder install a opensuse env quickly so that we can do the actual image build with a working zypper

clarkb: yeah, current goal is voting devstack, but it's good that I understand the whole setup steps so that we can do it again for the next distro. 42.3 is just around the corner and I would want to have that as well