Sometimes software performance tricks are found from a methodological and thorough search. Sometimes it requires divergent thinking and courage to try crazy ideas. Sometimes an idea is just the beginning that needs to be followed with a lot of hard work.

How to foster a time period where everyone can try different ideas to improve the performance of the software we're working on? Everyone on the team has at least several months of experience with the software and are very good at it.

Do you agree that divergent thinking will help find ways to improve software performance? Why? Why not?

What techniques will enable us to quickly try out an optimization idea? Is fast coding speed necessary for getting good results from the try-out?

Finally, how much "time" should be allocated to ensure good results without creating the possibility of slacking off?

Is experimentation necessary to prove that "a faster way to do something" exists? (Added 2011-06-07)

(For bounty purpose only-2011/06/07 the team size is 2-4 developers, no dedicated QA. All code, unit test and performance test done by developers. Because of the nature of the project, profiler result is useful in showing proportional execution time even if it does not reveal a single bottleneck.)

When you say improve the performance, are you talking strictly from a performance/benchmark perspective, or do you mean more intuitive UI, better workflow, etc., i.e. a better product?
–
Richard DesLondeJun 2 '11 at 6:16

Programmers tend to be smart and creative (since these are prerequisites to be any good at programming) so it's always good to let them try out a wide range of ideas when trying to solve problems. There are however two things that are important to remember when attempting to improve performance (I am assuming with "performance" you mean reducing execution speed):

Algorithmic optimizations tend to work much better than anything else. As a trivial example, whatever you do to your bubblesort implementation, with sufficient numbers an extremely slow implementation of quicksort will eventually be faster.

Doing anything performance-related is completely nonsensical unless you measure (profile) and base whatever you do on the results.

My main point is that it's important to make sure you're on the same page with everyone regarding these things before you start a period of wild experimentation. It's always a shame to find out afterwards that your less experienced co-workers were trying things that could never work (and you could have told them that up front).

Sadly, I can't speak from experience. But I heard that Atlassian has a single day where employees were allowed to do their own thing, whatever they wanted, and present their ideas in a sort of party atmosphere. It turned out well for them apparently.
But I'd have to agree with Andersen and say that when it comes to performance, creative and out-of-the-box ideas are less important then profiling what processes take the most time. Perhaps once you've profiled your system, you could give everyone a day to come up with ideas about how to help speed up important sections of the process. After they present their ideas, you can pick which ones to try.

One successful practice we did on some of my previous teams was having the concept of Deep Dives. A few folks would get together in a conference room, determine some user scenario, and just start either stepping through code or looking at profiler logs. In some cases, the data clearly showed bottlenecks that allowed us to convince skeptics that there really were perf issues in their own code! To make this successful, there were some key tenets we followed:

Try to focus on critical scenarios or code paths where bottlenecks are suspected. Don't waste tim in optimizing stuff that doesn't need to be optimized (if it ain't broke...)

Keep the group small and focused on the people who know the code best. Don't forget to include the tester and program manager of the feature - they have key insights and can benefit from either participating, or gathering info for how they can test better.

Start the session by having the area owner give a high level architectural block level diagram and overview of the area. What are the key components and briefly describe what they do. You'll be surprised how many times the block diagram didn't reflect reality once we dug into the code. (Actual quote: "I didn't know we still used that component. I thought we got rid of that years ago!")

Expect to find functional bugs as well as perf issues. That's a good thing. Also, expect that, sometimes, you won't find anything significant. That can be a good thing, too.

Expect to have several long sessions. These are working meetings. Get comfortable, and work through it. You get a lot more done when you all can collaborate for extended stretches.

Take notes, good notes. If you use a defect tracking database, consider opening issues immediately to keep track, even if they are low priority.

Avoid having the entire team participate in a "Performance Push." These usually don't have the results that management expects for the reasons that Thorbjørn Ravn Andersen
mentioned in another answer. You'll get great gains in some areas, regressions in other areas where people aren't familiar, and it's hard to predict/track how much gains you should get to say "you're done." That's a challenging conversation to have with management.

The reason why you might need to improve the speed of your software is if something in it is noticeably slow. If that's not the case, then optimizing is a waste of time. But if something is slow, then do the task.

... And to do the task, there are two steps in this order:

See if the function that is doing the task is efficiently written. Does it have an good or poor algorithm? Is it accessing a database in an efficient way. Is it looping 100 times when one time can do? Often, simple inspection of the code can find the one obstacle and not only fix it, but make you a better programmer at the same time.

Don't spend more than an hour or so on number 1. If you can't find the problem in an hour, then use a profiler to find the spot in question. Once you know the problem point, you can go back to number 1 and do that again, bearing down to find the best way to improve the code that you've identified.

Knowing how to make a simple, efficient design to start with. If you get this part wrong, experimentation will not make much difference. For example, know how to tell when using a code generator is a winning design approach.

Knowing how to tune software by locating the activities that are a) expensive on a percentage basis, and b) replaceable with something better. Everybody knows you should "use a profiler", but that's not enough.

** Exceptions might be tight hardware-dependent code, like graphics rendering, processor pipeline or CUDA behavior, or experimenting with network or DB protocols, where you just need to get familiar with the best way to use it.

ADDED: There is something that many programmers of large systems find surprising.
It is that in large perfectly well-constructed programs, there can be large, invisible performance problems, and profilers can't find them because they are not localized to routines.
They are part of the general structure of the program, even though the program may be done in the very best style.

Just to give a concrete example, here is a program with source code (in C++) that does a job.
It is distilled from a real C program I worked on.

It does what it was intended to do, but what fraction of its time is not really necessary?
How much could it be speeded up?

Well, in the first version of the program, something perfectly reasonable-looking and nonlocal (invisible to a profiler) was taking 33.3% of the time. When it was replaced, that time was saved, and that was the second version of the program.

In the second version of the program, something else (invisible to any profiler) that could be removed was taking 16.7% of the time. Removing it led to version 3.

In version 3, 13% was removed. Out of what was left, 66% was removed.
Out of what was left after that, 61% was removed!

Then finally, out of what was left after that, 98% was removed!

So what's the big picture? Out of every 1000 cycles spent by the original program, how many were removed?
998 !

Every program is different, but every large program, in my experience, has a series of time-taking issues that profilers won't find, that manual sampling will, and that, if the programmers are truly going for maximum performance, can be removed for large speedups.