Monday, June 13, 2011

GDM & Login Cleanups

We are inching closer and closer to going live with the new GNOME desktop upgrade for our thin clients. With all of the complexities of hundreds of icons and the nuances of having hundreds of users, it really does take 6-12 months to prep a server for this type of deployment. Not only do you have to consider the GNOME server, but also all of the remote servers and software applications. We are also weighing upgrades and connection techniques (RDP vs Citrix vs UIS).

We are up to about 45 concurrent users that are testing the new server, and uncovered a nasty issue in GDM related to two users trying to authenticate at the same time. Halfline/Ray was awesome as usual to create a patch and it's already posted. I just now need to get it built on OpenSuse 11.4 and will test.

My focus last week was on optimizing the user experience in authenticating and then the time it takes before avant-window-navigator launches.

GDM TuningThese are the changes I made to GDM to make it run faster, and produce a dialog as soon as possible after XDMCP request

I opened the .ui file of the password dialog box with glade-3 and made it a fixed size. Previously the "animation" process was kind of slow over remote display. It would display with no entry widget and then wiggle a bit and resize larger and then complete. My strings and logos won't be changing, so it was ok to test with my art/strings and size accordingly. This improves the user presentation and seems to make it a bit faster.

I turned off server side GDM wallpapers, hesitantly. I like the idea of keeping as much as possible on the server in case one wants to make changes in the future, but by turning this off I was able to speed up the process of getting the password dialog. I made a change to the thin client build that gives the Xserver a black wallpaper (-br), so the old school gray/white grids are gone. Users only see this for a few seconds at most each day, so comfort has increased in this change.

The /tmp/orbit-gdm directory basically never deletes any linc* files, and at one point we had a million temporary files (after many, many months of me not knowing about it). I now have a cron running that cleans these out each night with linc-cleanup-sockets. I'm sure that temp file placement is diminished when packed full of old files.

I removed several .desktop files in the gdm startup group that we didn't need. It was attempting to start another pulseaudio process, even though pulse is already running on the thin client. This was probably creating some nastiness and delays.

Xsession TuningI have hand selected some gconf settings that are loaded when users log in and found that they were taking a few seconds to load. On a home/personal computer forcing in settings would never be done, but in a production environment where people are working 24 hours of the day...I have to ensure a working session each and every time. These gconf settings remove any customizations that might cause failure. So what I did was break them into smaller groups and then fire them off into the background concurrently. This was able to shave off several seconds from the point that you enter your password and hit ENTER to the point the panel appears. I also reviewed every line of the script and removed any items that were unneeded or obsolete.

This morning I put all of these changes live, they had been tested on a VM clone of the server and worked fine. So far the results are promising. The GDM dialog appears almost immediately upon request and the panel appears in about 4 seconds after the password is accepted.

This week I'll continue to ensure that all icons are brought over from the older GNOME desktop, QA work, and also working on testing the VM copy that is being generated once a week. We are experimenting with techniques to have the VM server detect first boot after copy and transition to new IPs automatically.

Also, to turn the conversation in a productive direction, I think anon's comment may have been about ROI.

Without profiling, you're basically guessing when you make the changes. ("seems to make it a bit faster") True, some of your guesses are educated, which makes them very likely. But it's still trial and error, which is effective, but not efficient.

@all: I did in fact generate a profile before I started. Yup even Government employees work that way. :) I did not post that information because the profile was not consistent. The server has user loads and would vary from time to time during the day. The profile had milestone markers and the number of seconds it took to reach each point. From that I could see exactly what was slowing down the login; and it was clearly all of the gconf settings. It clearly is faster, and the times that I noted where based on averages during various server loads. 40 people give you wild fluctuations in speed from second to second. Cold starts and a controlled environment was not possible, I worked in these changes while people continued to work.