Ok this just might be me going insane from quitting smoking, my mind is nuts, but this has actually rung true for me anyways,

I know multiple files are better and prefered, especially as the program gets larger and larger. Which makes sense in theory, to organize them.

However it seems for me, the opposite is true. I find it makes more sense having fewer files/modules and larger single files. Before i thought i was just a newbie and that is that. But now i think i figured out why this is so. Every time i try to split my program into multiple files, or create from scratch a program with multiple files, whereas it normally feels comfortable to have it all in one class, i get confused. I try to reprogram what is in another module in this module, pretty much reinventing the wheel that i invented a module ago. However when i have larger fewer files, i dont re program what i already programmed.

Here is a perfect example of what i mean. This is a program i wrote in the last 2 days. The remake of my original quit smoking program. The original program is one single file. When i want to search for a piece of code in it to "modify" or whatever, i know exactly where to go. I also never rewrote code in it.However this is the newer version. With in the last 2 days i have split this code and that code up to different modules, pretty much to the point i confused myself. IF i need smoething, i am not sure which modules it is in. I also think i rewrote code in multiple modules to do the same thing.

So i guess the over all thing is, i find it easier to program fewer larger files with a class that does a lot, rather than smaller files with a class that does this and a class that does that, etc.

Of course I could just be losing my mind from this quit smoking, i dont know.

I would like to not get into a bad habit and be set in my ways before i realize that they are bad habits.

If you dont want to download it and just get the gist of it:this is the program with multiple files

There is a cost associated with having to manage multiple files; just as there is a cost associated with having to manage very large files. Eventually the cost of having a massive file will outweigh the cost of having multiple files, and that is when you split. When you hear "multiple files are better than a single large file", "large file" usual means several thousand lines. The reason your experience did not match that statement, is probably because you haven't worked with a sufficiently large code base yet (your quit smoking program is only ~200 lines of Python). While multiple files may be better than a single large file, a single moderately sized file may often be better than a multitude of small files. This is especially true if we're talking about splitting what used to be a single-file script into several modules. The cost of having a two-file program vs a one-file program is higher than the cost of having a three-file program vs a two-file program (in the latter case you're already tracking multiple files, in the former, you have a single file that can be moved around independently).

Splitting stuff for the sake of splitting is not something you should be doing in Python, though other languages (*ahem* Java) may encourage it. Each module should be a logically coherent whole -- everything in the module should be related (the exception being "utils" modules often found in projects which basically contain a mish-mosh of functionality that did not belong elsewhere; and such modules are frowned upon by some). If this is the case, then there should rarely be any confusion about where the functionality that you want to edit is. E.g. in your case, the splitting between main.py and terminal_display.py seems very artificial.

A lot of the split/don't split decisions are down to personal preference. The costs I mentioned above are, at least in part, subjective. Different people will have very different ideas about when to split. Find the balance that works for you. As a rule of thumb, consider splitting a file when

It is getting too long. What is "to long" is a judgement call; but if there is a noticeable delay when your editor loads the file from disk, or when doing fuzzy search through the file's lines, that's a hint.

You have a set of functionality that you keep copy-pasting between scripts -- it probably belongs in its own module.

If your application has non-trivial architecture. Splitting logically distinct parts of the architecture into their own modules may make the application easier to understand.

There is some functionality that you want to isolate from the rest of your application because it either dodgy (e.g. an adapter that deals with some external APIs/data sources that are likely to change in the future), or because it is likely to be re-used (e.g. a parser for a file format you often work with).

I would generally agree with it being a personal choice, but that's only for personal programs. For team projects you have to have a consensus with your team about breaking it down into multiple files. Also, when your work is going to be used by other people, you have to try and consider what is going to be easier for them to understand.

I would also agree that a 200 line file is not worth splitting out. To take an example from work (SAS, not Python, but its a general programming issue), my latest program has eight files with 90 + 425 + 75 + 374 + 2354 + 610 + 371 + 717 = 5016 lines of code. And notice the size differences. Why haven't I broken down that 2354 line file? Because 1800 lines of that one huge set of related conditional statements for the 300+ different searches that the program is required to run. You have to break them down when and how they make sense and relate to each other. That huge file is the meat of the program, the rest creates files based on those searches, emails those files, creates summary reports, and so on.

ok that makes sense. ThanksI orginally thought "everyone" preferred to split as much as you can.

I guess the question lies then:I have another program that is 1000+ lines of code. It is currently one file, one class. Now i wished I split it up before, because currently i keep adding to it, since i dont want to take the time to rewrite it split up. So now i just keep appending code ot it as its needed.

So i guess my thought when splitting that 200 lines of code was meant to be for future use as i append more and more code to it. As i feel it will get to the point where i wished i split it up like the other program i have. Of course if i did this with every program, i would end up with a ton of modules with very few lines of code in each, with the intention of adding to them in the future.

It's usually better to take the time to split up a file once you decide that it should be split up, rather than trying to pro-actively split things up from the beginning. At the start, you don't know what shape your program will take as you develop it, so you have no idea what the right file structure for it would be. Incorrectly partitioning your program will cause more trouble that it will solved (as you have experienced first hand).

The exception to that is if you're starting on a project that is similar to what you've done before, and can reasonably anticipate what the final thing would look like. A lot of framework-based projects tend to be like that. E.g. Django web framework comes with a tool that automatically generates a predefined directory structure for your project, because the majority of Django projects tend to have that structure (and the framework defaults rely on it). But even then you might sometimes find that your initial guess was wrong and you have to re-structure it further down the line.

Re-structuring, and more generally refactoring, is an intrinsic aspect of software development -- it's very hard to get things right from the start (especially if you're working on something new). And it is (almost) always better to put the effort into righting the mistake one you discover it, rather than plowing ahead with whatever you have at the moment (it will only get harder to refactor the further you go).

setrofim wrote:It's usually better to take the time to split up a file once you decide that it should be split up, rather than trying to pro-actively split things up from the beginning. At the start, you don't know what shape your program will take as you develop it, so you have no idea what the right file structure for it would be. Incorrectly partitioning your program will cause more trouble that it will solved (as you have experienced first hand).

I would say that if you are starting a project that is likely to get big enough that it will need to be split up, then you should determine the shape of the program before you start programming. It saves you a lot of trouble going down dead ends in the design and then having to back out and start over.

ichabod801 wrote:I would say that if you are starting a project that is likely to get big enough that it will need to be split up, then you should determine the shape of the program before you start programming. It saves you a lot of trouble going down dead ends in the design and then having to back out and start over.

This is the Big Design Up Front debate. Basically, it depends. If you have enough information and experience to design your system up front; you should probably do it. In other situations, it is often better to create a rough initial outline for the system and let the details like source directory structure fall out of iterative development process organically. I'm being very general, obviusly. In some cases, certain aspects of the structure are obvious from the start (like when writing a Django app), in which case it makes sense to implement them. But fussing over module separation at the very start, in my experience, is not very productive; YMMV.

i think i am going to go your way setrofim. Because i am spending more time fussing with modules, where this goes, etc. instead of actually getting it done. I think that would suit my way of thinking best.

I think i am just going to write out my programs, and then spend the time to adjust, split them, if they end up being large. Most of my programs anyways get forgotten and/or are small. Plus i am doing hobby programming, so i have no boss or other programmers to account for

setrofim wrote:This is the Big Design Up Front debate. Basically, it depends. If you have enough information and experience to design your system up front; you should probably do it. In other situations, it is often better to create a rough initial outline for the system and let the details like source directory structure fall out of iterative development process organically. I'm being very general, obviusly. In some cases, certain aspects of the structure are obvious from the start (like when writing a Django app), in which case it makes sense to implement them. But fussing over module separation at the very start, in my experience, is not very productive; YMMV.

I think that's a misstatement of the core issue. The core issue is good, quick, or cheap: pick two. The "Big Design Up Front" is an argument of the agile crowd, and in my experience agile programming is about picking quick and cheap. I prefer to pick good, and I am generally forced to pick cheap.

ichabod801 wrote: in my experience agile programming is about picking quick and cheap.

Not really. Agile is about giving you as much control over the quick/cheap/[s]good[/s]quality lever as possible, even in later stages of the project. Yes, in practices, due to the market forces, the lever ends up being continuously jammed into the quick/cheap corner. But there is nothing about agile per se that prevents writing quick/quality or quality/cheap software, if that's what the user/customer wants (which it almost never is, even when they say they do). Certianly, the claim that the lack of rigorous design up front predisposes baldy designed software has little merit.

I apologize if I've been snippy or rude in this thread. I was posting to it while dealing with the all-day-webconference-meeting-from-hell. It probably didn't help that it was about a horrible piece of made by agile developers.

My two cents about this: I recently had a Java project where I tried to set up an abstract class to define some vague functionality, and then inherit from and refine it, but I ended up using the interface of the new object so heavily that the abstract class was a huge hindrance. I also recently wrote a primitive Lisp interpreter in Java, and my design that I did up-front there worked, made sense, and made my life SO much easier. It was for a class, and I saw a peer's code where they had done typing with huge, ugly, difficult to understand switch statements (similar to an if-else tree in Python) for handling types instead of letting Java's good-enough-for-that OOP do the heavy lifting.

My point is, the more experience you get, the easier it'll be to make these decisions, and the more likely they'll work out. But as you're learning, you'll screw up, and learn from it. I'm afraid my peer didn't learn his lesson, and will get chastised on a large project for it as some point. But I encourage you to experiment with what you want to, whether that means big files for now, or breaking files up, and then when you're feeling like the project is cumbersome the way it is, experiment with something else, since growing as programmer doesn't depend on getting a job (like he frightfully will). Don't expect it to always work out, be ready for it and be happy that when something goes wrong, you've learned something useful!

Join the #python-forum IRC channel on irc.freenode.net for off-topic chat!

Please prefer not to PM members. The point of the forum is so that anyone can benefit. We don't want to help you over PMs/emails/Skype chats that others can't benefit from

don't have too much time at the moment to join in the debate properly but i will throw in two important elements of object oriented programming which, to me, seem related to this discussion and i didn't see mentioned while scanning this thread, namely cohesion and coupling.

pedros wrote:don't have too much time at the moment to join in the debate properly but i will throw in two important elements of object oriented programming which, to me, seem related to this discussion and i didn't see mentioned while scanning this thread, namely cohesion and coupling.

It is interesting that you should bring those up. Coupling and cohesion are indeed related to the concept of modularity, and so are relevant to discussion about modularising one's program (though it's perhaps going into higher level of detail than this discussion has been at so far). They are not really related to object orientation but are more general software engineering concepts. One can talk about coupling/cohesion between/within classes, but one can also talk about about coupling/cohesion between/within packages, modules, components, files, or pretty munch any other unit of software.

Coupling is basically the measure of how tightly a unit of software is integrated into the rest of the system (the external dependencies it has). Conversely, cohesion is the measure of how tightly the unit of software is integrated within itself (the internal dependencies it has). There can be calculated in various ways depending on the constructs offered by the language, the level at which they are being measured, and what you are planning on doing with them. Typically, they are expressed as ratios within the range (0.0, 1.0).

Roughly speaking, you want fairly loose (low) coupling and tight (high) cohesion. This would indicate that the components within your module are all related to each other and that your module has a limited number of external dependencies that may affect or be affected by any changes that you make to the module.

Coupling and cohesion are typically used when analyzing existing software, rather then creating new software (hence they haven't been mentioned until now). They have been used to estimate defect levels in different parts of software, costs of making changes (such as adding new features to existing software), dependency analysis, and as a general "goodness" metric for software.

While these metrics are fairly widely used, their practical usefulness is still being debated.

coupling/cohesion metrics only really make sense for large code lines. In smaller code lines you are more likely to encounter pathological cases (on either end of the loose/tight spectrum), simply because of the nature of smaller programs (you'll probably have one or two core components that are very tightly coupled, satellited by a few smaller components). Since aspects of software that cohesion/coupling are supposed to proxy for (difficulty of understanding, modifying and reusing code) apply less in smaller systems, these metrics are meaningless.

Since software is very heterogeneous, and code lines vary wildly, it only makes sense to compare them within the same code line, either between different parts (e.g. identifying components that are likely to be troublesome and merit extra attention during testing/code review) or between releases (estimating the number of defects or testing effort that is likely to required compared to the previous release). But general statements, "coupling above 0.86 is bad and you should really look into refactoring that class" are not going to hold up in practice.

As with most software quality metrics, the predictive power of coupling/cohesion is generally comparable to that of SLOC (source lines of code) count. In other words, while the metrics may give you a somewhat accurate prediction for effort/defect rate/etc, you can get a prediction that is basically as good from just analyzing SLOC, which is usually much easier to measure and understand.

Thus, IMHO, coupling and cohesion may be useful in analyzing large existing code lines (e.g. when you're planning on introducing major changes), but not when writing new software. In the latter case, I find a good heuristic for design quality is "how easily can you explain it to someone unfamiliar with the code?"

setrofim wrote:Basically, you would just be creating more work for yourself. You don't really gain anything by creating a multude of files, each with only a few lines in it. The habit of sticking each class into its own file (and multi-level namespacing into different directories) is one of the reasons why it's a nightmare to write Java without an IDE. In Python a single .py file double up as a module and comes with it's own namespace. If you put each class into its own file, your code would be full of "from my_class import MyClass" kind of lines.

I would say that the extra work is a worthwhile trade off for code readability and maintainability although I readily admit that it may simply be familiarity. I take it that there is no equivalent to PHP's __autoload function in Python?

In any case writing loosely coupled code is, while sometimes more tricky, usually a way to better and more testable code. I'm just a new guy here so don't know the ratio of professional to hobbyist so I apologise if I am being either too obvious or boring Consider the following

In the second example, SomeClass and anotherObject are loosely coupled which allows us to unit test the behaviour of someMethod without needing a full implentation of anotherObject. Instead we can pass in a mock object with just enough of it's behaviour implemented for our test. The design pattern used in the second example is called Dependency Injection.

*sigh* here we go again. OK, let me preface my response by saying that I think we're really bikeshedding here. At the end of the day, if you know what you're doing, use whatever works for you. Having said that..

elija wrote:

setrofim wrote:Basically, you would just be creating more work for yourself. You don't really gain anything by creating a multude of files, each with only a few lines in it. The habit of sticking each class into its own file (and multi-level namespacing into different directories) is one of the reasons why it's a nightmare to write Java without an IDE. In Python a single .py file double up as a module and comes with it's own namespace. If you put each class into its own file, your code would be full of "from my_class import MyClass" kind of lines.

I would say that the extra work is a worthwhile trade off for code readability and maintainability

Quite the opposite. When I said you're creating more work for yourself, I meant going forward, i.e. in terms of maintainability. It means that in order to do pretty much anything--debug defects, add features, refactor, whatever--you have to deal with, and keep track of, a bazillion files. You'll have a ton of imports all over the place, and will have to keep shuffling those. You're increasing the chances that you'll miss a file in your commit and so end up pushing a breaking change. You're duplicating information, since the same thing is encoded in the name of the module and the name of the class -- you're making your namespaces deeper without adding anything semantically (this is not an issue in languages like Java, since there files do not have namespaces), and when you rename a class, you have to remember to rename the file as well (easy to miss this when you're using a refactoring tool to rename the class at the place of instantiation). Sure none of these are "major" issues and they can be mitigated buy using a decent IDE; but why create the hassle in the first place? How is putting every single thing into its own file going to help with maintainability? Splitting your program out into a few logically coherent modules is enough.

I don't believe readability really comes into this. How is code split into several files more readable than the same code in one file (appropriately commented and laid out)? Splitting things that are closely related to each other but are loosely coupled to the rest of the code into a separate file makes sense, and I'm certainly not advocating keeping everything in a single file (though that has its advantages, e.g. bottle micro framework is purposefully staying a single-file library). But splitting each class into it's own file for the hell of it is going too far and does not make things "more readable", only more annoying as you have to keep jumping between files.

elija wrote:I take it that there is no equivalent to PHP's __autoload function in Python?

Not really, no (at least if I understood what __autoload does correctly from the docs -- I don't know PHP). In Python, you have to explicitly load modules before you can use them. There is the "from module import *" notation which imports the contents of the module into the current namespace without having to explicitly list them, but it's not the same thing (and the use of this notation is very much discouraged). "Explicit is better than implicit." and all that.

elija wrote:In any case writing loosely coupled code is, while sometimes more tricky, usually a way to better and more testable code.

That's true, up to a point. Trying to make the coupling as loose as possible results in horrendously over-abstracted code that is impossible to comprehend (let alone maintain or test). People end up writing plugin frameworks that only ever going to load one plugin, overly verbose APIs, enums with only one value (but hey, we might add more later). One of the advantages of dynamic typing is that you can write future-proof and flexible code without over-engineering things.

In the second example, SomeClass and anotherObject are loosely coupled which allows us to unit test the behaviour of someMethod without needing a full implentation of anotherObject. Instead we can pass in a mock object with just enough of it's behaviour implemented for our test. The design pattern used in the second example is called Dependency Injection.

I'm not sure how this relates to our discussion? Dependency injection does not necessitate putting classes into separate files (and the act of moving a class into it's own files does not somehow loosen its coupling -- that is determined by what the class depends on and what depends on it).