Sometimes it comes to me that the biggest challenge for an engineer when he find his application getting worse in performance is lack of enough information.

Imagine that you go through the weekly performance report from your application's access log, and find you get much lower response times from July. All you can do is to pray for it not to get worse the next time.

6 Answers
6

Without more details on the type and nature of your application, it is hard to give more than general hints:

Measure, measure, measure. Profile your app to detect the performance bottlenecks. Then measure again after any changes, to verify the effects of the changes.

Identify what changed from June to July, which can cause the observed performance degradation. Do you have more users? More transactions? More accumulated data in the DB? More network traffic? (Of course you can have more than one of these at the same time.) This may also help identifying where the weak spots of your app are. Or they may even point you to external entities, like a web service you depend on, which slowed down, making your app wait longer.

Define "slow" and "fast enough", in agreement with your users. Do they primarily care about throughput (average amount of requests/data handled per time period) or latency (average response time)? These require different kinds of optimization.

Thanks for the answer. One of my current question is I do not think our code changes will lower the performance while it do happen in our production env.
–
VanceOct 1 '11 at 14:27

1

@Vance, lower performance may be a result of the same code processing more data. E.g. querying a table may work lightning fast initially for a few hundred rows, but performance may gradually worsen as more and more rows accumulate in the table. You need to choose your algorithm / query taking into account not only the current, but also the expected amount of data to be processed in the future.
–
Péter TörökOct 1 '11 at 15:24

1

@PéterTörök, exactly and this is why is is a very poor practice for developers to use a small dataset to develop from.
–
HLGEMOct 17 '11 at 19:37

If you do not have sufficient information, you need to collect the information you need. If you do not have a hard goal for what is satisfactory performance, you need to get one (otherwise you cannot tell if you reached the goal or not):

Information is typically collected in several ways:

Profiler attached. Collects lots of information and can help you pinpoint slow spots.

Logging steps. You write your own log statements. The log entries get timestamped and you can look at the steps you take to identify what is "too slow" for you. Do it in a machine readable way so you can post proces to gather additional information.

Database surveillance. What queries are being done? How fast are they? Is it fast enough according to your performance goal?

Does the debugger have a "pause" button, or can you interrupt the program by typing Ctrl-C or some such keystroke?

If the program is getting slower and slower, it is doing at least one thing it doesn't need to be doing, and doing more and more of it.
Since it could be running in, say, 1/3 of the time it is currently running, that means 2/3 of the time it's doing things it doesn't need to do.
If you can take an X-Ray snapshot at a random point in time, chances are 2/3 you will see it doing the unnecessary thing(s).

Pause it while it's running, and examine what it's doing and why.
Especially examine every line of code on the call stack.
See if you can explain to yourself or someone else precisely why, in detail, that particular instant of time was being spent.
(You don't need to measure anything. You need to see if what it is doing can be gotten rid of.)

Repeat this a few times, like 5 or 10 times.

If what the program is doing at that point in time is not really necessary, and it is doing it on more than one time you stopped it, you have found something you can fix that will give you a big speedup, guaranteed.
The bigger the problem is, the quicker you will find it.

It's got nothing to do with requirements.
It's got nothing to do with measurements.
It's got everything to do with just "cleaning house", by this method.
Here's a fairly typical example.

Edit:
It's conventional wisdom to hear "measure measure" or "use a profiler".
What is not conventional is to hear how much speedup was achieved that way.
The few times I've heard the results of profiling, it was like 10% to 40%, or a factor of 1.1 to 1.4.
That's pretty anemic.
If a series of problems is found and fixed, there is a compounding effect, as shown in the example above.

P.S. Here's an example in C++ of a 3 order of magnitude speedup, containing all source code versions, copies of samples, and blow-by-blow description of how it is done. Some programmers have learned/discovered how to do this, but most have not. It couldn't be simpler.
To this day, I am still totally mystified that it is not common knowledge.
The only explanation I can see is that teachers don't work with programs large enough to require this kind of tuning.
What they do is teach gprof, for no other reason than that it's there, so they can teach it and move on.
What that does is infect their students with all the incredibly persistent myths of performance tuning, resulting in exactly the problems you describe.

P.P.S. In case the point isn't clear, any thread, whether it is alone or among thousands, has a certain minimum amount of work it absolutely must do to accomplish its purpose. Anything it is doing beyond that is taking extra time.
In the example linked to above (which is only one particular example - every app is different) these "bottlenecks" were removed:

33.3 % in push_back

11.1 % in out-of-line indexing

7.4 % in Add/Remove

31.9 % in new and delete

9.3 % in getting Nth list element

6.1 % in character I/O

Adding up to over 99%!
By removing them you get orders of magnitude speedup.

Now the kind of thing I hear is "Well sure, it's silly to do 2, and 6 wasn't necessary either."
Hey, nolo contendere, but what about the other four "bottlenecks"?
If you don't fix them, how much speedup do you get?

If you want serious speedup, whether or not a profiler is used, you have to clean out all of the problems !
Any ones missed will be the dominant speed limiters.

@MikeDunlavey - the easiest way to determine if a thread is idle is to profile it and see if it uses any cpu-time.
–
user1249Dec 14 '11 at 22:31

For future readers: There was a long discussion about the feasibility of this method on heavily multithreaded programs (like loaded web servers) as manual inspection of the call stack of the sample to indicate where the active parts of the program is, to assist getting a mental image of where time is spent, becomes much harder. A tool to help locating non-idle threads of the sample was suggested, and - in my opinion - the easiest way to identify non-idle threads is to profile them.
–
user1249Dec 15 '11 at 22:51

Also for future readers: We were not contending over how effectively the method works in single threads, as in this concrete example. Neither of us has seriously tried it in the case of thousands of threads. If someone has a severe performance problem in a thousand-thread application, there would be no harm in finding out, and potentially a large benefit.
–
Mike DunlaveyDec 22 '11 at 17:17

...or simply just use a profiler.
–
user1249Dec 22 '11 at 17:24

@Thorbjørn: Go for it. Take the little C++ app in cim2 in that sourceforge project, and make it 700 times faster, using any profiler you like (without taking a peek at how I did it :)
–
Mike DunlaveyDec 22 '11 at 18:05

1> Run perfmon/task manager (on win) of top (unix) to see if CPU, disk, network or memory are being used heavily/thrashing/flatlining and investigate as appropriate. Ideally disk/network/memory should not be using anywhere near all the available resources. CPU at 100% is ok where you are doing something CPU intensive. Check that you are not running more threads than CPU's (unless the application is light on CPU use and disk is not an issue).

2> Do an isolation test to see if there are competing processes on the hardware that are using resources that could affect performance.

3> Check to see if Concurrency (if applicable, eg web app) has increased. If it has you need to look at you application to find the bottle necks that need to be addressed. Usually these will be in the database (index and table locking issue, slow queries) but may be in the infrastructure set up (ie enough CPU, memory)

4> At this point you probably have a data volume issue. Now look to see what has changed. Are data files getting larger? are database tables getting larger? If data sizes are getting larger then you need to understand what the application is doing to determine what course of action to take. If you use a database, you should check that the appropriate indexes are used and that queries are optimized. A SQL profiler is handy here as well as visual inspection of the code (assuming you know or know someone competent in database optimization) You are looking for the longest running query and the most frequently run queries. Another option is to look at archiving or deleting old or irrelevant data, if possible. Also ask if all the data is really required to be processed/stored. You also need to look for locking situation, code inspection and SQL profilers will help.

5> If the application is CPU intensive, you may like to look at optimizing the CPU intensive portions of code. Visual inspection, trace statements while monitoring CPU usage and profiler tools will help identify code that could be changed. The solution may be to use a different algorithm of remove/reschedule processing tasks. A set of experience eyes may be of assistance here.

6> remove any needed debug/logging code. Remove anything that is not really needed to run.

Again, the key is to understand what and how the application is operating, and thinking about how these steps interact with the hardware/database.

Performance tuning is a critical skill in any application that hits a database. Every database has a large book written about how to perfromance tune. Get the book for your dabase backend, read it cover to cover and then start taking the steps you need to identify and resolve performance issues. If your database was designed by someone without this knowldge to begin with, likely you have a bad design that has badly designed queries as well and possibly poor indexing. In this case you might want to read about refactoring datbases as well as you will likely need structural changes as well as fixes to the indexes and queries. This is complex stuff, you can't learn it all from questions on a website like this one. Just praying it will get better is counterproductive, there are many things you can do to make it better. And yes as time goes one and more data is added the queries will get slower and slower especially if they are poorly designed.

There are also non-database reasons for poor performance, but get the database tuned properly and it is likely that many of your issues will go away.

Putting aside the rant that just restates the question, the question is asking what to do after the fact. Saying "well you should've planned ahead" isn't helpful.
–
user8Oct 2 '11 at 11:08

Mark yes it may not be helpful, but none of the answers so far have given any better solutions (most probably because there isn't one). Perhaps this mistake can serve as a good lesson for the future.
–
DarknightOct 2 '11 at 16:07

"none of the answers so far have given any better solutions". I beg to differ. OTOH, I agree that typical approaches to profiling/scaling don't work, but that's hardly evidence that nothing works.
–
Mike DunlaveyOct 3 '11 at 13:03