Hello,
I have two students interested in diploma thesis called Yum plugin for
suggesting packages based on usage:
http://bit.ly/18hrHbL
TL;DR - from anonymized access log, create a database of suggested
packages using data mining techniques and provide a Yum plugin that
would suggest "Users of vim also installed: ctags, git, ..."
I am gonna create a Fedora Feature wiki page shortly describing this in
more detail. Our goal is to offer this project for integration into
Fedora later on, at least provide Fedora packages for it.
To do that, we need good source of data. It would be best to collect
access logs from one or two main Fedora mirrors. We would provide short
script in Python that would parse access logs and anonymize the data (IP
address hash-salted) and filtered only relevant data (RPM files from
latest Fedora release or updates repositories). That would be phase one
which should give us a sample data.
Phase two would be to integrate this script with logrotate and for one
Fedora release cycle (Fedora 19) the script would collect relevant
anonymized data into a file. Final suggested package database would be
created from this file (or maybe files to allow us to move them on the
fly out of the stat directory).
The big (legal) question is if we are able to provide this anonymized
data to public, or if we want to sign NDA with all people involved. I am
CCing Tom for this question.
I need your help with connecting to relevant people. Any comments are
appreciated.
Many thanks and I hope this effort will lead to improving user
experience with Fedora packaging.
--
Later,
Lukas "lzap" Zapletal
irc: lzap #theforeman

Greetings.
we are now in the infrastructure freeze leading up to the Fedora 19
final release.
You can see a list of hosts that do not freeze by checking out the
ansible repo and running the freezelist script:
git clone http://infrastructure.fedoraproject.org/infra/ansible.git
scripts/freezelist -i inventory/inventory
Anything listed as freezes is frozen until 2013-07-03 (or later if Beta
slips). Frozen hosts should have no changes made to them without a
signoff on the change from at least 2 sysadmin-main or rel-eng members.
Thanks,
kevin

Um, sorry, this is kind of a last-minute afterthought, so I'm not expecting
it for F19 launch or anything crazy. But:
For F19, we're putting cloud images on the mirrors. In the staging tree, the
URL is like this:
https://dl.fedoraproject.org/pub/alt/stage/19-RC1/Images/x86_64/Fedora-x8...
and I presume that the final pattern will be
../19/Images/x86_64/Fedora-x86_64-19-20130624-sda.qcow2
(Or possibly s/Images/Cloud/ -- I had thought that's what we previously
agreed but I need to dig that up.)
Is there a way that we could make permanent URLs for the following:
- per-versionsorted list of URLs to the image on mirrors
- per-version redirect to the closest (or at least randomly chosen) image suitable
for giving to OpenStack Glance directly
- an unversioned URL redirecting to the latest per-version list
- an unversioned URL redirecting to the latest-version redirect
I'm just guessing that mirrormanager is the best place for this. I'm happy
with any other solution....
--
Matthew Miller ☁☁☁ Fedora Cloud Architect ☁☁☁ <mattdm(a)fedoraproject.org>

The infrastructure team will be having it's weekly meeting tomorrow,
2013-06-27 at 19:00 UTC in #fedora-meeting on the freenode network.
Suggested topics:
#topic New folks introductions and Apprentice tasks.
If any new folks want to give a quick one line bio or any apprentices
would like to ask general questions, they can do so in this part of the
meeting. Don't be shy!
#topic Applications status / discussion
Check in on status of our applications: pkgdb, fas, bodhi, koji,
community, voting, tagger, packager, dpsearch, etc.
If there's new releases, bugs we need to work around or things to note.
#topic Sysadmin status / discussion
Here we talk about sysadmin related happenings from the previous week,
or things that are upcoming.
#topic Fedora 19 release tasks
We need to make sure we are all ready for Fedora 19 release and there's
nothing that could cause problems for release.
#topic Private Cloud status update / discussion
#topic Upcoming Tasks/Items
https://apps.fedoraproject.org/calendar/list/infrastructure/
#topic Open Floor
Submit your agenda items, as tickets in the trac instance and send a
note replying to this thread.
More info here:
https://fedoraproject.org/wiki/Infrastructure/Meetings#Meetings
Thanks
kevin

Last week when we were talking about spawning rdiff-backup to backup
our systems, we diverged into discussing app/apache logs and the
somewhat complicated system we currently have for grabbing those logs.
Right now we have a list of hosts on log02 that it should grab logs
from. Those hosts need to have rsyncd running on them to allow access
from log02 to fetch the /var/log/httpd/ path from them.
That requires 2 things to be coupled and it is a bit awkward if you set
up a host that is tricky to access from log02 or isn't on the vpn.
In general I also am not in love with having to have rsyncd listening
on systems - even if it is ip-restricted.
So the thought was we could do something like this on log02:
1. setup an ssh key on log02 that can run rsync to /var/log/httpd on
all hosts
2. make any host that needs to have its logs retrieved be marked in
the ansible inventory host/group vars
3. git clone public-ansible-repo onto log02
4. use group_by to construct a group of the hosts which can then be
retrieved using rsync.
The sole reason for using ansible here is so we can keep the log sync
info in our inventory and to parallelize the retrieval of logs.
This is more or less identical to what we talked about for backups
using rdiff-backup.
When we were discussing this Luke mentioned then using
tbgrep(https://pypi.python.org/pypi/tbgrep) to search the resulting
files and compile a set of tracebacks our apps are dumping out.
If we have all the logs on log02 generating a report like this would be
pleasantly kept away from the rest of our hosts and could give us
reasonably useful reports of brokenness.
I'd love some feed back on if this is all crazy or not :)
-sv

Hi there,
For several weeks (I don't really know when transifex-client has been
updated on the builders), we got issues pulling POs for the websites.
Too errors was introduced by the transifex-client update:
- it has to use https now (fixed with previous patch)
- .transifexrc has to be readable and writable now (only readable before).
Please see the first patch here.
I am also doing an other commit to improve the pulling script.. In order
to send us errors.. And to correct the hostname.
Pretty simple and used only for websites.
Thanks,
--
Kévin Raymond
(Shaiton)