I'm not an experienced C/C++ programmer at all. In fact I haven't looked at it since college. My primary background is in Java, but currently I'm spending most of my days with php, perl and now python, which I love.

Anyway, my employer has presented me with a new problem to solve. The backend of one of our major web sites generates .eps files, which go directly to press. Currently this backend is written in Perl, runs on linux, uses free libraries, and the jobs are executed as a batch process within cron. It worked beautifully in the early days of the site. Now that the demand on the system has increased dramatically, the perl code is not only too slow, but also too resource intensive. The client is unhappy about the performance. Depending on the size of the order, it can take twenty minutes for the print file to end up on the press.

Firstly we plan on moving the code to a more powerful machine. Secondly we want to abandon the cron job, and code a daemon that would probe for new orders and immediately dump them into a processing queue with each job in the queue executing in its own thread. Currently the jobs are executed in a single process on a single processor machine.

My question is this. How safe is C++ development on linux? I've heard horror stories about thread-safety, buggy, non compliant compilers, and code breaking between compiler versions. Our orders are stored in a MySQL database, and the mysql++ libraries are not thread-safe. What does everyone suggest? A purely C solution and abandon C++ thread safety issues (I'm not really big on this one). Take the leap into C++ development on linux with threads. Stick with what I know, and try to implement the system in java....finally, abandon threading on linux with C++ and fork a new proccess for each job.

04-24-2004

Salem

I haven't seen anything here which would warrant using threads, so worrying about thread safety seems a non-issue to me.

How often does your cron run?
If it's every 20 minutes, then its no wonder that jobs take up to 20 mins to arrive, and that the CPU seems VERY busy for a time.
And if the CPU remains idle for the next 15 minutes until the next cron interval, then I would suspect your "the perl code is not only too slow, but also too resource intensive". Perhaps all you need is much better scheduling of the work to be performed.

Have you used a perl profiler on the code you have to find out where the time is being spent?
Since PERL can easily call C, perhaps recoding just the expensive bit of the task in C would be a worthwhile step.

> Currently the jobs are executed in a single process on a single processor machine.
Have you watched this process run, say using 'top' to see how much CPU time it is using. If it's getting like 80%+ of the CPU all the time is is running, then its committed to the single task at hand. Threading will only make your long tasks that much longer to run.

If on the other hand, its consistently below say 20% even when its apparently very busy, it means its doing a lot of waiting around for things like file I/O. In this case, you may get some benefit from a threading approach.

> How safe is C++ development on linux?
Probably safer than many windows platforms. The compilers and the operating system take a lot less crap coding practices.

> I've heard horror stories about thread-safety
Probably because so few people really know how to code using threads, and see it as some "magic bullet" to solve their problems.Read this
Though read the Powerpoint original if you can.

> buggy, non compliant compilers
Far more true of windows compilers IMO
VC++ 6 was a complete mess from a standards point of view.

> code breaking between compiler versions.
All down to crappy coders who learnt "dialect" C from crappy books espousing the virtues of void main and the like.

> Our orders are stored in a MySQL database, and the mysql++ libraries are not thread-safe
See the link above - that's why you need locks all over the place. locks make your apparent parallelism all sequential again, so are you any better off than you were before?

> fork a new proccess for each job.
If MySQL is safe at the process level, perhaps.
Again, it depends on the CPU utilisation of a single process. If it's over 80%, then multiple processes will simply make the already time consuming ones that much slower.

04-24-2004

uberdog

Salem,

Thank you for the informative reply. Let me see if I can answer some of you questions.

Quote:

I haven't seen anything here which would warrant using threads, so worrying about thread safety seems a non-issue to me.

The service is running on a smallish server ( 800 mhz single processor, and a gig of ram). The reason for all of the thread consideration is we are moving it to another multiprocessor machine (4 550 mhz processors and 4 gig ram). Threading, if done correctly, should help on the new machine, correct?

The cron job runs every 5 minutes. If there is a big order waiting to be processed or a lot of orders, the cron jobs begin to overlap, all the memory is consumed (one of the libraries has a leak in it) and kswapd starts going nuts. The kernel starts shutting processes down, and we have to give the server some serious attention. Granted this only happens once a month or so, but it's still extremely stressful and embarrassing.

Quote:

Have you used a perl profiler on the code you have to find out where the time is being spent?

Actually we haven't, and I probably should. I have used crude profiling and timing techniques, and most of the time seems to be spent in a C library (written by the last programmer that had my job) that converts various image formats to eps, which is the meat of the problem.

Quote:

That won't make thread issues go away

I've always heard (probably from the types of programmers you're talking about) that C code is relatively safe compared to C++ due mainly to the way C++ exceptions are handled by the gcc compiler. So any code that uses C++ exception handling is inherently unsafe for threading and leads to stack-thrashing if it is compiled with the gnu compiler. Is this true?

Anyway thanks for your time and the link. I'll check it out, and get back with you if I have anymore questions.

04-25-2004

Salem

> Threading, if done correctly, should help on the new machine, correct?
It should - but you need to check some details of your specific OS and threading implementation. You might end up with different processes running on different processors, but threads within a process all stuck on the same processor (not a good thing).
If the split is only at the process level, then one process per job will get the work spread over all the processors, and solve the thread-safe problems with MySQL you highlighted previously.

> that converts various image formats to eps
What's your hard disk utilisation like when this is happening?
If it's pretty high on your single CPU solution, then its going to be a serious bottleneck on your quad-CPU solution.
However, watch you don't get all that abberant swap behaviour mixed in with any of your disk usage stats.

> the cron jobs begin to overlap,
Would detecting this and backing off the 2nd cron help?
A 'simple'

Code:

if [ last_cron_not_done ]; then exit ; fi

Implementing last_cron_not_done is an easy exercise for the reader ;)

Or maybe a loop around that which sleeps periodically (say 10 seconds) for up to a total of 5 minutes before finally exiting (when the next cron will start anyway)

> all the memory is consumed (one of the libraries has a leak in it) and kswapd starts going nuts
Unless this gets fixed in the new machine, then the cron backoff idea above would be needed anyway. At best you're only going to defer the same problem from happening again. If the workload catches up to the machine again, you'll be back in the same problem.

> converts various image formats to eps, which is the meat of the problem.
Ah, sounds like the PERL is just a wrapper around this then. If it's fairly simple PERL, and not much of it, then I doubt you'll get much benefit out of trying to optimise that.
The C code on the other hand would seem to be definitely worth further attention. 'gprof' is the profiler for C code.
Gather some test data (some simple, some complex) and feed that into a profiled version of the C code. Make sure you have a couple of examples of each kind of image format (one type may be much more expensive in comparison to other formats).

I can't answer your last question - I don't know nearly enough C++ to comment.
Exceptions are pretty new to C++, and pretty complex in their own right. Mixing all that up with threads is probably too experimental for something which you regard as being pretty mission critical.

04-28-2004

EvBladeRunnervE

Quote:

What's your hard disk utilisation like when this is happening?
If it's pretty high on your single CPU solution, then its going to be a serious bottleneck on your quad-CPU solution.
However, watch you don't get all that abberant swap behaviour mixed in with any of your disk usage stats.

well, besides the coding issues which salem hit on the nail, what type of HDs are you running on this rig, you mention it being an old machine, so are you using old 5200rpm or lower drives? That could possibly be one of your major problems, HD Read/Write/seek speeds.

04-28-2004

uberdog

Quote:

you mention it being an old machine, so are you using old 5200rpm or lower drives? That could possibly be one of your major problems, HD Read/Write/seek speeds.

I'm sorry, when I said old, I meant relatively old. It's a Compaq TaskSmart C4000, which has 2 18.2Gb scsci drives....The drive hardware should be solid enough.