2013 Fedora Infrastructure tasks

overview

This page is to help us collect things we want to work on and get done in 2013. Initially it will serve to help us organize what we want to get done at the upcoming Fudcon Lawerence. (hackfests, presentations, etc). After that it may be repurposed to note those things we are actually going to work on
in the coming year.

fudcon

Lets coordinate and gather things here we want to do at fudcon. (Don't forget to add these to the main fudcon page as soon as we have decided on them). I am planning to try and have a high level "These are things we want to work on" session saturday morning. Hopefully everyone can attend that and then we can try and go off and do those things.

technical sessions (friday)

Several infrastructure folks will be giving tech talks. Please attend and heckle^Whelp.

hackfests (saturday and sunday)

cloudy with a chance of infrastructure - finish up stuff around private clouds, move to production.

revamp our apprentice/new contributor process - figure out a way to get more people involved long term. (more mentoring?)

ansible - figure out any setup and questions, timetable to replace puppet

lightning talks (friday)

2013

This will be a list of things we want to get done in those timeframes.

2013 infrastructure FAD

The fad worked great to get 2 factor auth done, if we can get funding we should consider another on another topic. Ideas welcome here.

monitoring - fix nagios, revamp how we manage it, make it stop bothering us all, but still tell us about issues, etc.

In the Fedora 19 cycle

Move publictest to the cloud and create a sundown on them

Move dev instances all to the cloud.

Make a push-based fasClient with ansible; replace the fasClient cron job on the infra boxes with it.

In the Fedora 20 cycle

Idea box for 2012 and beyond

Integrate jenkins into our infrastructure and framework (pingou)

Make a clearer division between back-end and front-end in our (web)-app (pingou)

our current app model makes it difficult to determine which app is causing the problem. so our solutions tend to be pretty coarse-grained. Given the failure-prone state of our apps it would seem like we should adopt a model which makes it simpler to see where the problems are coming from. As our apps stabilize we can move to an environment sharing more resources.

ARM servers in infrastructure

Discuss issues around using some ARM instances for our needs.

Would need to likely use Fedora instead of RHEL

What things would be good for them?

Revamp nagios

Use check_mk on all machines and add a small amount of custom checks on top.

Automate adding nodes, etc

Extend 2 factor auth or other security measures past sysadmin groups?

hosted? pkgs? specific groups?

signed commits?

Fedorahosted-ng

Ditch trac for something better?

gitlabhq or other easier interface for git repos?

Decentralize!

Search engine? try and get dpsearch working again?

old stuff from 2011 / 2012

Here's stuff we talked about in the past and never got done:

Upgrade TurboGears1 apps to TurboGears2

Write automated tests using TG2's test framework

Fix the FAS authenticators to be less chatty

Put fas session information into memcached

Update FAS to have an admin console (no more direct db needs)

Update pkgdb to have an admin console (no more direct db needs)

Fix the Django auth providers to be faster

Automated hosted projects (*)

Automated creation of new machines -- run one command and it's up

glusterfs/cloudfs fedorapeople filesystem

Replicate db so that we don't have a SPOF

logging sucks (*)

IPs hit proxies but we also need them to hit the app servers. (*)

Fas needs to log more actions to its database (this is in a new version of FAS, we just need to upgrade)

Do periodic reinstallations of guests (like app servers) so that we know there's nothing changed not in puppet.

Reduce koji's resources

Finish and deploy coprs

the puppet nodenames do not match the hostnames in nagios. Add aliases to the nagios hostnames to match them up correctly. This will allow us to trigger passive checks using nsca. (ties to nagios revamp)

Setup a schedule for rebooting hosts (to test for broken hw when it's not a critical point in the release cycle)

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, and JBoss are trademarks or registered trademarks of
Red Hat, Inc. or its subsidiaries in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.
The Fedora Project is maintained and driven by the community and sponsored by Red Hat. This is a community
maintained site. Red Hat is not responsible for content.