#436: SSH access to systems in Beaker lab
--------------------------------------+---------------------
Reporter: atodorov | Owner: tflink
Type: defect | Status: new
Priority: major | Milestone:
Component: Blocker bug tracker page | Version:
Keywords: | Blocked By:
Blocking: |
--------------------------------------+---------------------
= bug description =
Currently systems in Beaker lab can be accessed only through bastion.fp.o
which is not as convenient as direct SSH into the system.
There's also the question whether or not to open the systems directly to
the Internet.
This needs to be discussed with infra. Filing here so it doesn't get lost.
--
Ticket URL: <https://fedorahosted.org/fedora-qa/ticket/436>
Fedora QA <http://fedorahosted.org/fedora-qa>
Fedora Quality Assurance

Now that we've had a second flavor of this issue (running out of
inodes on a buildmaster) hit us, it's probably time to address log data
retention.
At the moment, we don't have a log data retention policy which has lead
to filling up disks with logs. We need some policy for how long we're
going to keep this data but I don't want to just decide something
without some form of discussion/documentation.
When we had this problem with AutoQA, we implemented a cronjob that
would delete logs older than 30 days but we also had a lot less disk to
work with back then.
There are 2 forms of log data that this new policy would affect: the
artifacts created by task execution and the build logs/data stored by
the buildmaster. Both are relatively simple file-based data which can
be removed without any additional consequences than no longer being
available.
The questions raised so far are:
1. How long is long enough to keep log and execution data?
2. Should be be cleaning up anything that references builds/artifacts
(like links in resultsdb) before we delete them?
3. Do we want to put resources into figuring out whether the result was
a PASS or FAIL before deleting it?
4. Should fesco be involved in this decision?
Thoughts or Suggestions? I really don't want to spend much time on this
but that statement does seem to come out of me when we're about to
spend too much time on a topic (at least some of which ends up being my
fault) :)
Tim

Hey folks! so for a quick Saturday morning project I had nirik spin me
up a test Fedora infra cloud node and tried to deploy openQA on it.
And it worked!...more or less.
There's one big problem: it seems there's no nested virt in the infra
cloud. I was able to at least kick off a test, as a PoC, with KVM
disabled, but obviously running without KVM is a non-starter for real
use. I was able to deploy everything and even run a test (by setting
QEMU_NO_KVM for the machines and uninstalling qemu-kvm from the
worker) , to prove that it basically works, but the test timed out in
anaconda init because qemu without KVM is so slow.
However, nirik says they're adding some new nodes to the cloud and
they can try enabling nested virt for those. If that works, I think we
could look at moving to a docker-ized deployment in the infra cloud
for our production deployment. That has a range of benefits: most
obviously non-Red Hat people could actually access our production
instance (woo!) - which means we can do stuff like linking to results
from bug reports or the compose_check reports - but another one is
that the cloud lives right next to the servers that host dl.fp.o and
alt.fp.o, so ISO downloads would be a lot faster and aren't using paid
bandwidth.
I filed an infra trac ticket for the nest virt request:
https://fedorahosted.org/fedora-infrastructure/ticket/4894 . So let's
hope that works out! It'd be awesome to get openQA out in the, well,
open.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

It's that time again - time to start upgrading our instances of
Taskotron.
I'm going to start with taskotron-dev today and hopefully I'll finish
before too long. This is going to be a messier and longer upgrade than
most due to the change to dnf - all of our roles need to be updated and
checked.
I'll send out another email when everything's back up
Tim

So, er, yeah....sorry :/ I was updating openQA to the latest tests and
stuff today and accidentally updated the system completely. That seemed
to stop openQA working properly - I don't think the openSUSE folks are
testing the 'stable' release against current openSUSE any more, and I
think some stuff changed in the more recent Mojolicious versions
they've sent out which breaks the old 'stable' release.
So I tried to fix it for a bit and couldn't, and can't find a way to
downgrade stuff back again. So I decided to just bump up to the latest
'unstable' openQA - it's working fine on happyassassin and upstream is
actually running it on openqa.opensuse.org if you look (it's got the
newer web UI and stuff). So I did that, and then went to reboot the
machine to get a clean start...and it never seems to have woken up
again. Sooo...yeah, I busted it. Sorry :( I tried poking around in the
RH provisioning system where the box seems to live, but couldn't get it
to wake up again (hope I didn't make it worse).
Can anyone who has access to the management console or whatever please
get the machine up and running again? Sorry again!
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

As a heads up, taskotron.stg is having some issues at the moment
stemming from a full disk. It'll be a few weeks before production hits
the same disk problem, so we have some time to fix things before it
becomes really critical.
As an odd side note - this is the second time that we've been saved from
a production outage because stg has been running longer than
production :)
The timing of this isn't great for fixing it quickly as there's an
outage for a critical infra system starting shortly and I don't think I
can get it fixed before that starts.
I'll send out another email once it's been fixed.
Tim

Monday is a holiday in the Czech Republic and with so many of the
regular suspects not available, I don't really see a point in having
the regular qa-devel meeting on Monday.
If anyone can think of a topic that should be discussed as a group now,
reply to this thread and we can still meet. Otherwise, we'll just sync
up as needed outside the meeting.
Tim

Hi folks!
So a thing I kinda hate doing and hence don't do often enough is update
the common bugs pages to mention when updates are available to fix the
listed issues. So I wrote a script to help! You can find it in the
fedora-qa git repository:
https://git.fedorahosted.org/cgit/fedora-qa.git/tree/commonbugs-update
it requires python-bugzilla, python-fedora and python-wikitcms.
Please do be a bit careful when using it - read the diff it prints
before making changes, and ideally also check mediawiki's history diff
after editing.
At its core what it's meant to do is this: spot when there's an update
to fix an issue but the issue doesn't mention it, and edit the issue
appropriately. As I wrote it, there turned out to be quite a bit more
detail than that, and I even had to tweak the wiki templates a bit. So
to explain some of the less-obvious bits:
There's a big difference between the actions 'Ignore' and 'Skip'.
'Skip' just means 'don't do anything at all about this issue right now'
- it results in no wiki edit. 'Ignore' means 'mark this issue such that
future runs of this script will ignore it'; it adds a magic comment to
the issue, and future runs of the script (by *anyone*, not just you)
will not show that issue. So please be careful with how you wield
Ignore.
The action 'Testing' is for use when the update is in testing, the
action 'Stable' is for use when the update is stable. 'Testing' uses
the Common_bugs_update_testing template, 'Stable' uses the
Common_bugs_update_released template and moves the issue to the
'Resolved issues' section.
There are a couple of actions 'Testing no update' and 'Stable no
update'. What these do is mark that there's an update for the issue,
but it doesn't really *solve* the issue - it's meant for use with
things like installer bugs, where we can ship an update but we can't
fix the frozen images. These actions use new params I added to the wiki
templates today. If you use the 'Stable no update' action the issue
will *not* be moved to the 'Resolved issues' section by default.
Any time you use one of the 'change' actions, you'll be offered the
option to manually edit the text afterwards. If you say yes it'll be
opened in an editor ($EDITOR is respected) and you can tweak it however
you like. You'll notice the text display has a magic line at the top,
that looks like this:
#MOVETORESOLVED: True
(or maybe False). That lets you change whether the issue will be moved
to the 'Resolved issues' section or not. If the word 'True' or 'true'
appears anywhere on that line, the issue will be moved; if not, it
won't. The default value is set to True if you use the 'Stable' action,
False in any other case.
Don't trim trailing newlines in the editor, they're there for a reason.
mediawiki whitespace handling is just the worst. :)
I've used the tool to update the F21, F22 and F23 pages today, so there
are no changes needed right now (it will display a few issues, those
are ones I reckon should be 'skipped' but not 'ignored' for now), but
we can use this to hopefully keep them up to date better in future.
Note the tool won't work right on <F20 pages as they don't use the
templates for referencing updates.
The code is somewhat ugly for now, I might clean it up at some point,
but I already spent ~1.5 days longer on this than I expected to :/
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

Hey, folks!
So while I was looking at needle cleanup yesterday I talked to the
openQA folks about the ENV-foo needle tags. I understand them a bit
more now.
These really don't have any special magic function: they're *used*
like any other tag. This means that we don't need to keep sprinkling
them around everywhere and, indeed, probably shouldn't.
They still have special significance in openQA in two fairly small
ways. The interactive needle editor in the webUI will automatically
create ENV-foo tags for several openQA variables - i.e. if the
variable FLAVOR is set for a test, the needle editor will add a ENV-
FLAVOR-(value) tag by default. However, there really isn't a good
reason to *keep* these tags unless we actually want them, according to
upstream.
The other thing is that os-autoinst still has code allowing the value
of those variables to be specified via OS environment variables. This
is apparently basically a hangover from an earlier openQA design, when
*all* the configuration stuff that's now handled with the openQA
variables was handled with OS environment variables. As far as I can
tell, it's something we should pretty much pretend does not exist.
So, I'd suggest we strip most of the ENV- tags from our existing
needles, and stop including them in new needles unless we actually
want them. That is, we should only apply ENV-DESKTOP to screenshots
which are specific to a particular desktop and which we don't want to
match for any other desktop, ENV-DISTRI we should probably never use
as we don't have two 'distributions' and likely wouldn't ever mix RHEL
tests with Fedora tests(?), ENV-FLAVOR we should only use for needles
we actually want to match only for a specific flavor, etc. When
creating new needles we should strip the ENV- tags unless they're
actually desired: upstream say that's what openSUSE needle creators
are meant to do (though sometimes they forget).
I will propose a further cleanup PR today.
This does leave us with an awkward case: ENV-INSTLANG. I wasn't aware
of this stuff when I changed it to 'ENV-LANGUAGE' in the non-English
language test PR. As the ENV values are still somewhat 'special' in
the code, I don't like the ENV-LANGUAGE name any more (it *looks* like
one of these 'special' tags but in practice it isn't, as the openQA
code does nothing with it like it does for all the other ENV tags), so
I'll change that.
However there's a bit of a snag to just using ENV-INSTLANG: os-
autoinst actually has 'en_US' set as a hardcoded default value for it.
Upstream says that really should move to the openSUSE main.pm, but for
now it's in os-autoinst. So we'll either have to use a tag called,
say, just LANGUAGE, and keep stripping ENV-INSTLANG tags whenever they
show up (through someone using the needle editor and not removing it,
or whatever), or we'll have to use ENV-INSTLANG but try and override
the en_US default somehow (I haven't looked at the exact code flow
here to see if that's viable). anyway, I'll come up with something.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net