bradcathey has asked for the
wisdom of the Perl Monks concerning the following question:

After lurking around the monastery for several months, and realizing I've written a lot of sloppy code, I've decided to rewrite a small web-based content manager a la PM style (warnings, strict, traint, etc.) and to make it more efficient (smaller). The script currently works fine, but is about 1000 lines (I know this is short for some of you system admin folks), and besides using the aforementioned good coding practices, I'd like to trim down the code by about 2/3rds.

My question(and I think this is a rather fundamental one): do I write a large program with a bunch of subroutines, or have a smaller program with a bunch of modules (like as in 'use foo.pm')? In the current program, the scripts are all related to the content mgr., but are performing a variety of tasks—mostly basic MySQL stuff like INSERT, UPDATE, SELECT, and then prepping variables for HTML::Template.

I've done some searching of PM and Google and have found a few things, among others, a rather passionate node on where to put your subs, or a piece by tilly on writing long programs (more on how to organize functions within the program). Found some caveats on variable scoping causing problems in longer scripts.
I'm sure there are many others, they just aren't coming to the fore.

So, any advice, previous nodes, articles, books, CPAN modules to shed some light on this topic, or is it beyond the scope of SoPW?

I would suggest writing it this time as a large program with a bunch of subroutines, BUT try to group your subroutines together in a logical fashion with clear separators between the sections that you come up with. Try to make each function make sense on its own, and limit how much functions in each section call functions in the other sections.

There are several reasons that I suggest this. The most important is that recognizing design elements requires some experience. If you are not in the habit of writing modules, then you probably lack that skill and can spend a lot of time thrashing about without making much progress. However you probably can more easily write functions and after the fact recognize somewhat natural divisions. Focussing on writing loosely coupled functions will make it likely that natural divisions can more easily be recognized. If the result turns out well, you can then try moving some of those sections into modules after the fact.

The process of trying to reflect on the result afterwards should help develop some of the design skills that you will need in the future to modularize up front. But not facing that in addition to everything else that you change will make the first attempt more likely to succeed.

I find that the easiest way to decide what should be a subroutine and what should be a separate module is by drawing a system diagram.

First determine the logical functional units that the system is composed of. Then draw a system diagram that shows the interaction between logical function blocks. A diagram is better than a thousand words. Before I proceed on a large project, I always draw lots of system diagrams and flowcharts to help me visulize the system. Design deficiencies can be spotted quite easily.

Rule of thumb for designing a good system:

1) Minimum cohesion. Design your functional blocks so that they have minimum cohesion between them. In other words, minimise interaction between modules at the same level.
2) What the subroutine does can be described by one sentense with out the word 'and' or 'or'... In other words, a clearly defined single task for each subroutine.
3) Keep the size of a module to less than a few pageful.

There are plenty of good references and books on good software design practise. I find the following link on software design quality management quite interesting.

The script currently works fine, but is about 1000 lines (I know this is short for some of you system admin
folks), and besides using the aforementioned good coding practices, I'd like to trim down the code by about 2/3rds.

First of all, rewriting something you are familiar with from the ground up
with best practices in mind is an excellent choice. But, I will caution you
that making it smaller isn't really a good design goal. Simpler, yes.
Smaller will very likely be a pleasant side effect of better design.

Go with modularization. I am sure you'll get many suggestions (good ones
too) about this, so I'll offer up what I feel might be a good way for you
to think about it, rather than specific suggestions of what to put where.

You've coded it all up once already, and you've used a couple of modules.
Review your main code with the following in mind: If you could simply wish
any module into existence (within reason and logic), can you imagine any
that would simplify your design? "Gee, if the Such::and::Such module
existed, I'd only have to do this and this, which is really the core of my
application, and not all this other stuff".

Take your time, and consider the kinds of work you are already *not* doing
because HTML::Template or DBI already existed. What other kind of work
would it be nice to *not* be doing in your mainline code? Jot down all the
ideas that arise, even contradictory ones. Further thought, or perhaps
questions and musings here, can help you choose among and refine these core
ideas. These are your potential module candidates.

As for recommendations for subs at top or bottom, the thread you referenced
contains most of the available arguments, no reason to rehash it again.

This sounds like an exercise in refactoring. And as such I agree with tilly and roger that you should not go to the level of breaking your code into modules until you have broken it into a set of subroutines and convertered unnecessary duplication into subroutine calls with differing parameters.

Once you've refactored the code into a set of generalized subs its quite possible that you will notice that they can be further grouped by some reasonable criterion and thus be extracted into an external module for reuse. Avoid trying to refactor the subs to match some preordained module structure, as its easy to end up trying to restructure a square peg into a round hole. Rather use your intuition to break the subs into the appropriate tasks and sizes that seem natural for the code and your own thinking. If this results in something suitable for modularization then so be it, but all too often its unnecessary.

---
demerphq

First they ignore you, then they laugh at you, then they fight you, then you win.
-- Gandhi

I learned A LOT about Perl by looking at my (and others!) old Perl code and going through the refactoring process - trying to improve what was already there, applying my new knowledge of Perl, and exploiting CPAN modules that I had become familiar with.

It's a good sign that you are critical of your own code and wonder if it's really the best that it can be. There are wayyy too many developers/admins/others that are content to write half-assed, barely functional scripts/programs/apps, stick them in production, and never look at them again.

Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"

You should also take a look at CGI::Application. From the docs: "CGI::Application is intended to make it easier to create sophisticated, reusable web-based applications. This module implements a methodology which, if followed, will make your web software easier to design, easier to document, easier to write, and easier to evolve."

Since no one else seems to have mentioned this, here's my extra $.02 on modularizing a big application:

Think about grouping subroutine functions in terms of things that you are likely to find useful in other apps, versus things that are likely to be unique to this particular app.

For example, you mention doing a lot with MySQL. Some of the code you write for that might be stuff you'll want to do again in some other project -- so that's a good candidate for a separate module.

That sort of thinking could guide your refactoring so that you spend a little more time now trying to separate the "app specific" from the "general purpose" (where "general" just means "you personally might use this bit again somewhere else"). Then you'll find yourself spending less time later on, as you create (or refactor) other apps, because their design and coding will benefit directly from stuff you've already done.

I am sure that many of the Perl Wizards on this site will condemn my methods, and indeed you yourself will likely turn your nose up and laugh at what are (I admit it) unconventional and "sloppy" programming techniques. I have only been in the Perl business for 18months so you probably all have years of exp on me. Still I'm gonna tell you what I do because most of important of all it works for me

I'm in the business of writing perl programs of roughly 500-4000 lines long. They are not beginners stuff, but still they aren't incredibly perl-ish as I tend to err on the side of making them readible by others in the dept that may be fresh to Perl

My strategy is to have one main script that contains the fundamental login of sequential operation, and then I have a load of other files with tasty sub-procedures in them that I use like libraries.

Take for example my current project, a pdf metaeditor. I have one main script "editpdfmetadata.pl" and then another script "pdf-utils.pl" that contains frequently used pdf operations; getTitle, setTitle, getAuthor etc... Again in DB applications I often separate out DB operations into a different script (or two) just so I know where everything lives. I never allow my scripts to grow beyond 300/400 lines of code.

The main failing in my scripts that people will undoubtably hate me for is using loads of require statements. I usually have a heafty bundle of them at the top of my main script to import all the sub procedres that I need to use.

It's not the most elegant way to program Perl but it works for me. Comments about using require statements like this would be appreciated as its nice to get others views.

And how is someone reading the script to know what function is defined where? See my reply to Simpler alternative to modules for my thoughts on what is wrong with just using a ton of requires, how to do it better, and why the alternative is better.

Unless it is obvious by reading the program from which required file each function comes (e.g. have "require 'quality.pl'" only import functions that are named 'quality_*', which would still be a fairly bad way to do it), then I'd use the Exporter module and use the modules (first turning them into proper modules with a package statement) with an import list so that the program is self-documented as to where functions originate from.

Also, the Camel book (blue edition, p.128) recommends use instead of require because use acts as a BEGIN block, and in essence, globally declares the modules at compile time (or something to that affect).