Month: March 2014

1. Introduction:

Design Patterns is the paradigm constantly discussed, not only by people trying to apply them as solutions to the programming problems they face but also by those, who are their eager opponents. No wonders a few other SAPM blog posts were devoted to that topic. And yet, I think I have some additional information I could share about them.

2. Summary:

This article introduces unfamiliar reader to the software Design Patterns and the advantages, as well as problems that they may relate to, when big projects are concerned. I also stated reasons, why in my opinion some of the flaws of Design Patterns may be solved with use of Software Frameworks, and how they help their users to comprehend Pattern Paradigm. In the last part, I shortly presented the Qt/C++ programming framework and described how some basic design patterns were used in its architecture and what implications it has on the client code of that library.

3. Background:

Design patterns are the solutions to reoccurring software design problems that engineers are facing in their projects. These are the techniques that have proven themselves in many different contexts and help to write code that is much more reusable. They are recorded in various catalogs (e.g. [1], [2]) along with the description of context they are supposed to be used and list of advantages and consequences of their applications. There are many benefits drawn from knowing and using design patterns [3]:

they can reduce development time as these solutions are given and widely used;

solutions are tested in a number of implementations, thus the design based on them is very robust;

they are very important in a large-scale projects, since they enhance documentation and facilitate communication among team members.

Apart from these, also inexperienced programmers may benefit from reading the patterns by developing good design skills and being prevented from “reinventing the wheel”.

Although all the good that design patterns bring, there are potential pitfalls of using them. Among the problem are steep learning curve, temptation of their abusement ([4]) by novice programmers or additional effort in design modification in order to take them into account. Also, they are not a piece of code that can be just used in an application being developed, but the abstractions that yet are to be implemented. What is more, as stated in a lecture – they may be an indicator of a “language smell” [5], the development language missing some features. And Design Patterns add to the code duplication! So what can we do then? Are these a real problems? In my opinion design patterns are the right tool to be exploited in software projects, especially large ones. However, I also think that some of the aforementioned drawbacks may be easily overcome with use of appropriate software frameworks which incorporate ideas of Design Patterns themselves.

4. Frameworks:

“A framework is a reusable design of all or part of a system that is represented by a set of abstract classes and the way their instances interact” [6]. It is a form of design reuse, provided by third parties in order to facilitate application development. Physically, it is a set of modules or libraries that extend functionality of used language. To be reusable, it usually implements design patterns and forces developer to think in terms of these paradigms. If a programming team uses this kind of framework, many issues of the design patterns are getting solved by themselves.

Firstly, I need to defend the programming languages capabilities. If the framework may be used in a given language, it means it intrinsically supports possibility of being facilitated with reusable solution. Thus, they are not susceptible to the “language smell” phenomenon. This is a case, for instance with a C++ language. C++ is said to be much more difficult than Java, which has much more design patterns already embedded. However when used with e.g. Qt ([7]) framework, it gets the same easiness of use. It could get even more useful than other languages if the framework is especially designed for a problem that application is to solve – for instance communication applications will benefit much more from patterns used in ACE ([8]).

Additionally, when using frameworks, it is much easier for developer to write bug free and higher quality code. It so, since they only have to focus on development of concrete solutions instead of abstractions and application internal skeletons. Not to be forgotten, the most popular frameworks tend to be used by thousands of people, thus it’s highly probable their bugs has been already spotted and fixed.

Thirdly, using established frameworks, help to get the use of design patterns right. The frameworks themselves tend to be documented well, which results in developers having better understanding on what is going on. Also, they have to implement client code only. That means, they are freed from making some of the design decisions and are forced by the framework to structure their components correctly.

Last but not least, open source frameworks give great opportunity to learn how to implement design patterns correctly in a real world. Even though such implementations are provided in the design patterns catalogs (e.g. [2]), they are considered there in isolation to everything else. In life, we sometimes have to trade single responsibility rule or take into account some other external factors. The source code of such programs is the best way to see, how top developers deal with such issues. This is one of the reasons, why I want to present the reader to the Qt Framework.

5. Qt Framework and its Design Patterns:

Qt Project [7] is a cross platform framework originally designed for C++ developers, which eventually supported also CSS(-like), QML and JavaScript languages. The project is constantly being developed and maintained for more than 18. Although at the beginning it was significantly refactored few time, the current release is rather stable. It is a mature and very well engineered project, which follows many of the established design patterns (more than 20, [15]). Of course, the library not only exposes the patterns to the client code, but extensively uses them by itself. This gives a proof by itself for how crucial design patterns to the solid design are. I suppose that quick overview over the design patterns used in Qt may be beneficial for some readers and get them engaged and encouraged to explore the topic in more detail on their own.

5.1 Strategy Pattern:

One of the exemplary usages of the strategy pattern (algorithm encapsulated into an object) in Qt is QValidator. This is an abstract class, that provides validation of the text input with QValidator::validate(…) method (also enables fixation of input, to make class more general with QValidator::fixup(…)). The context object (be it a line edit, a spin box or other), which uses the validator depends on its interface and for every input change, checks if it’s valid and takes appropriate action (e.g. if the integer value is expected, however arbitrary text is given, its background goes red). If there is no appropriate validator available to meet ones demands, they have to inherit QValidator and overwrite its two abstract methods.

Fig. 1. QLinedEdit with QValidators class diagram.

The solution is really nice, since the validation code may be reused in every class that requires textual input. And it is much more compact than either creating subclasses of QLineEdit (and each input widget) that validate input or registering validation object on text change events (as is done e.g. in Java Swing library).

5.2 Lightweight with Proxy:

Some of the Qt classes (like QVector, QList, QMap or QDomNode) implement Proxy pattern. It means, they don’t contain explicitly the data that they are supposed to have (e.g. array of stored elements in vector). Instead they consist of a pointer to a shared data block that contains a reference count and the data. The classes take care of housekeeping such as modifying the counter depending on how many classes have access to the data. Copying of data occures lazily (so called copy-on-write), when one of the instances is about to modify it. Such approach has few benefits over the traditional explicitly shared data. First of all, the objects are lightweight and making their copies is fast because it only involves copying the counter. Secondly, if most of them is used in read-only mode, no additional memory is to be used, no matter how many copies are done (apart from the memory for pointers). It also saves programmers from choices if they should pass and return instances by reference or value (with unnoticeable performance penalty).
If it comes to the programmer, he doesn’t have to be aware of the copy nuances. However, if he also wants to have such a behaviour in his own classes, Qt encourages him to do so by inheriting the QSharedData and referencing it with QSharedDataPointer or QExplicitlySharedDataPointer.
When discussing the Proxy pattern, it’s worth noticing Qt has its own smart pointer classes, which may help handling pointer. Among those there are QPointer (which is a guarded pointer), QSharedPointer (reference counting pointer), QWeakPointer (automatic weak reference to the pointer). Using smart pointer classes prevent from memory leaks and handling dangling pointers.

5.3 Observer pattern:

One of the big advantages of Qt over the other C++ frameworks is easiness of realising communication between objects (inherted from QObjects) – their signals and slots mechanism. The mechanism allows for type-safety and loose coupling between the callee and caller, unlike in standard callback scenario. It is realised with additional meta-data structure added to objects at the compilation time. However at the end, everything comes down to the Observer pattern.

5.4 Composite and Template Method:

Like in many other GUI frameworks, Qt utilizes Composite Design Pattern to display its widgets (QWidget inherited classes) and handle user input events. The widgets are arranged in hierarchical structure (parent-children relationship), which reflects their z-order (depth on the screen).

Fig. 2. Composite class diagram in case of QWidgets.

As may be seen in the Fig. 2 presenting a class diagram, Qt Composite is a slightly different than the traditional one. There is no distinction between Leaf and Composite. It is because all the QWidgets are allowed to add children at any time.

When the screen is to be redrawn, appropriate event (QEvent) is sent the parent and its QWidget::paintEvent(…) method (Template Method) invoked. Then the event is propagated down to the children.

When the mouse click event (or any other user input event) is concerned, it’s first dispatched to the top-most children and appropriate QWidget::mouse*Event(…) method invoked. If it’s handled by the method (QEvent::accept()), no further actions are taken. Otherwise, the event is propagated up towards the parent, on each hierarchy level calling widgets that might be interested in it (click occurred in their bounding rectangle).

Fig. 3. Example scenario of click event.

Fig. 4. Object diagram for example setup.

Fig. 5. Sequence diagram for a given scenario.

To visualize theses steps, let’s assume scenario of generation of mouse click event, as in the Fig. 3. Keep notice, the button is disabled, thus it will not be interested in the event. Object diagram for this setup is presented in Fig. 4. Simplified sequence diagram for scenario of message dispatching shows Fig. 4. First, qApp sends event method to the button. It doesn’t accepts the event, thus qApp tries its parent, which is QTabWidget tab.

6. Summary:

In my opinion, Design Patterns are very useful paradigm. They are a set of tools, which if used wisely may pay-off, especially when large programming projects are concerned. Undeniably, many inexperienced software developers may find them to be a great source of knowledge, to improve their design skills. However, they are no silver bullet and may be over- or misused leading to degradation of a project. Software engineers need to be careful when applying them – that means they have to consider all the advantages and trade-offs that pattern catalog mentions about them. This is a skill, which still requires an experience. However, if developer has possibility to use programming frameworks as Qt (or ACE, .NET, Java), it is much easier for him to make design decisions and start journey with design patterns.

We already know that a lot of IT projects fail, and they have been high failing rate for a long time. I recently had a discussion with my colleague who worked at Initrode, and albeit the company was successful and a major player in the world market, he was perplexed about how his manager handled the team. It seemed that his managers core task was to estimate the time for each tasks and then track it and adjust it if the task took a different time, and that act of adjusting was which perplexed my colleague the most:

Why would you do something, if you knew you would have to change it later?

Plans are Nothing

Indeed, by that definition in continuously changing and intangible environment like computer science and type of plan is pointless. We learned that estimating is difficult and requires a lot of knowledge and feedback to improve the quality of our estimates, however there is inherent danger in estimating cost and task in IT, because the domain of computer science keep changing continuously, which invalidates our knowledge of the problem.

Planning is Everything

Putting deadlines aside, if we were to divide development methodologies based on amount of the whole plan needed in advance we would get the two following extreme cases (form lectures):

figure out how to do it the best way, then do it

just do it, and figure out if it is right later

The first choice aims to indicate any possible “future” technical dept and tries to overpay it in advance, hoping that once the dept arises it will be enough. The second acknowledges the dept and delays paying it while hoping that, once the dept arises, the resources available will be enough.

I will not argue which side of the spectrum is better as it mostly depends on the project in question and it a topic for a whole another blog post. I just wanted to make a simple observation:

And by planning I do not mean plans (hint above). I would like to call it “the ability to change plans based on current situation”, but then I would be describing a principle of Agile methodology; the truth is planning is a more fundamental action that that.

Proper planning increases the knowledge of the current state of the project and improves the estimates about the projects; proper planning is somehow orthogonal to all software methodologies — all of them require it, they just make different assumptions about their estimates and evaluate them in different order.

Therefore now matter what methodology you use and what you do, go back to your plan, re-evaluate it and make sure it properly reflects the state of your project. Possibly as often as you can, even if it is just adjusting the tasks times for your team.

Open source software is software that can be freely used, changed, and shared (in modified or unmodified form) by anyone. [1]

Generally open source software is maintained by inidividuals that work on the particular project for free. However, various companies allow and even encourage their engineers to contribute to open source for various reasons. In my experience the hardest part of contributing to open source is getting started so I will give some advice and pointers to some resources to anyone wanting to contribute to open source.

Why you should collaborate

Working on an open source project gives software engineers the opportunity to work on a set of skills they wish to enhance while benefiting of the support they get from other contributors.

Secondly, experience gained from open source sontribution is a great topic to talk about in an interview in industry. It shows that you got to practice a set of skills that are important for the job you are applying for.

Last but not least, it is a great way to give back to the community. Most of us use open source software (e.g. Android, Linux) so I consider only fair to give a hand in fixing bugs, or writing test or documentation, or in any other way that will make the porject better.

On a side note, during lectures we’ve been told that companies choose to create an open source intiative that will go hand in hand with or support some other source of revenue (e.g. their servers or the ads they serve). It makes sense, and I don’t want to disagree but to add that, for example, I have seen interviews where Google engineers were saying that the reason why Chrome started is that they were Firefox, but Mozilla was being slow with improvements, so they decided to create their own browser to create competition for Mozilla. Another version I heard, is that they just thought why not have their own browser. I would have found the second version hard to believe, but then it’s very similar to how Gmail started so it is plausible. Unfortunately, I can’t provide references to these two stories, as I have seen them a while back and can’t really remember where.

How to start

The first time I contributed to open source was in the second year of my degree when we were required to contribute to an open source project of our choice as part of a Software Engineering coursework. We did receive advice on how to pick a project: ensure there are multiple contributors to the project (minimum 2), ensure the project is not stale (i.e. there have been recent contributions made to the project), pick something that seems interesting to you. In my case, lack of experience did take its toll because while I did try to follow recommendations I still didn’t fully understand them.

Since then, I’ve become slightly wiser. My following advice is not a secret, they are the usual advice one finds in blog post of this type, but this is also what experience has taught me and I hope it will help anyone thinking of contributing to open source but doesn’t know where to start.

Don’t just look for a project with a couple of developers, look for a project that has whole community behind it. In most cases, these communities have a forum, a mailing list or irc channels which they use to communicate, which are a great media to get in touch with people already working on the project and ask for help. It is often the case that an active wi,ll be beginner-friendly and have good tutorials that will get your started. [2] and [3] are great examples of good communities where one can start contributing to open source. In my opinion it would have been a good idea for my course to allow students to contribute to such projects as long as each student brings different contributions (e.g. fix different bugs, add tests for different parts of the project or write documentation for different parts of the code).

When checking whether a project is stale, don’t only check when the last release was made (this should be in the last month) but also how often code submissions are made and how active the community is in terms of posts.

[4] is also great resource that helps people to start contributing to open source. They provide tutorials that teach people how to use tools such as version control and working with tar archives. They also have a database of tasks that need work which can be filtered by project, skills or difficulty. Moreover, most of the tasks have mentors, or people that are ready to help anyone willing to contribute to their project.

Finally, anyone can start their own open source project then get other people to contribute to the project. If you have an idea get some friends and start your own open source community.

Once we agree that effort estimation for project completion is better when based on actual measurements, we tend to phase out human input. I think we should use measurements to give a perspective to developers and start the estimating discussion from there. This gives more information when estimating, but ultimately leaves the developers responsible for the resulting estimation; this is a good thing, since the measurements cannot capture everything that we, as humans, know about a project and it has other positive consequences that I will discuss.

Context

Consider effort estimation. We do it before starting a new project so that we know how many resources (developers, time) we need to allocate to it. Take the simple example of a team of developers that have worked together before. They have experience of previous projects of different sizes and try to estimate effort directly (together! estimation should not be done only by management). They apply some methods to reach a consensus (such as Three-Point or Wideband-Delphi estimating). After a while of doing this, they realise for each estimation, the discussion always includes some metrics — they need to first get some data before estimating.

Now suppose the team realises that effort is difficult to estimate directly. They want to automate the process a little, based on their findings. They start to base their estimates on other metrics, that they also estimate. Examples of such metrics are the projected size of a project and measures of complexity such as the number of its components. Now they start measuring previous projects in order to have some data to base estimates on. Since they know the resulting values of this process are not very precise, they try to use many metrics to at least get an accurate estimate.

This seems like a natural progression. However, I think we jump too quickly to relying on data directly and phasing out estimates done by developers. We know that however hard we try, the measurements will not be able to capture all that we know about a project’s size and complexity. We also know that project development is not only about continuous coding, it is also about a lot of refactoring, design decisions and collaboration. People have a better feeling about the latter than what our measurements can capture, especially in a familiar team of developers.

Consequences

First, better estimates. Developers cannot blame the methods for bad estimations when the final responsibility is on their shoulders. They will be more involved and this will result in more frequent updates to the project timeline. They will tend to more quickly admit to estimation mistakes because the method tells them to expect the initial estimates might be wrong. Since their input is valued over the measurements, they will be more confident and interested in repairing the mistakes proactively. This avoids the situation where a developer tries to conform to initial estimates at the cost of a bad product.

Second, developers are no longer incestivised wrongly. If the estimation is only based on measurements, developers might develop a bias to confirm those measurements. Suppose they estimated the size of the project to be between 10KLOC and 20KLOC. Then the effort estimates will be based on previous projects of the same size. Even if this project turns out it needs 30KLOC for completion, it does not mean the effort estimation was wrong; there are other factors to take into consideration, the size estimates and measurements should not be analysed individually. However, because the developers estimated less than 30KLOC, they are then incentivised to develop more concisely; this might consume unnecessary effort and end up sabotaging the project. Now also suppose the developers were given responsibility of the final effort estimate. Then the initial LOC estimate is not that important, it was just an aid; if it changes, the developers will be more likely to adjust the effort estimate rather than try to prove they were right about that particular measurement.

Finally, developers are encouraged to follow trends in their behaviour. We can build stronger teams when we treat developers as people who want to get better at what they do, not just as coding machines. They are not encouraged to prove their measurements right, they are now encouraged to use that information to their best interest. This might translate to new creative ways of measuring their development effort. Since they are using the data directly for estimates, they can identify factors better than any outsider and most importantly interpret them better. This could lead for example to developers starting to hashtag their commits so they can easily differentiate between e.g. drafting, refactoring, code writing etc.

Conclusion

Involving the developers in the final effort estimates gives them incentives to make better measurements, adjust more frequently and finally translates to a method of constantly improving estimates — now the developers have the power to repair their mistakes, and they are best informed to do this.

Note: if some of the concepts or names are unfamiliar and Google is not helping, keep in mind I wrote my article around these two lectures: estimation and measurement.

The concept of free software wasn’t very common 15 years ago when everyone were able to get a personal computer. Commercial software products were prevailing the market and almost everybody were either using Windows 98 or XP, the documents were written is Word, Excel was used for carrying out different types of computation, BsPlayer for watching movies, even WinRar was paid, although that there was a bug – once you start the trial, you stay with it forever. When I bought my first computer, I didn’t even know that there is an operating system called Linux which can be downloaded and installed for free. And I didn’t need to because Windows was offering everything for me. I was able to play games, watch movies, chat with friends, browse the internet, etc. Most of the PC user had limited technical knowledge and the commercial software developers were aiming for designing functional and user-friendly software. This time can’t be however compared to the early 70s, when the users were actually the programmers and most of the programs were developed by computer science academics and corporate researchers working in collaboration. The software products of this time were generally distributed under the principles of openness and collaboration, because of the fact that they weren’t seen as a commodity. Software was free mainly because of the necessity of improvements and continuous integration into new hardware platforms. This was happening because different academic cultures were involved in the development process, and when the software was given to somebody else, it came with its source code so that it can be tailored to fit the specific requirements of the user. [1]

Nowadays Open source software is more widely used, than 15 years ago and this is because people have realized that it has some major advantages over the commercial software. First of all it offers more flexibility than the Commercial software solutions because it can directly be modified by the user to fit its own needs. Of course this means that the user will have to be more experienced in terms of technical knowledge and also have to understand the source code but still it leads to some positives outcomes: users become co-developers and having more developers significantly increases the rate at which the software evolves. Also the users can fix bugs, write documentation or implement new features. This factor also leads to security improvements. Commercial software is made by highly skilled professionals who are aware of the potential security problems, but before releasing their product they can’t cover all of the potential security breaches. Instead what they do is to give the product to the end-user and if a security problem arises, they simply provide a patch with the necessary fixes. In the case of Open Source, because of the fact that the codebase is reviewed by many more people, such kind of security risks can be discovered on an early stage, so that they do not affect the end users.

Another advantage for the users is obviously the fact that they are not paying anything for using such a product. The industry has estimated that the use of open source programs saves up to 60 billion $ a year. Just a simple comparison here: people who are working with Java do not have to pay for working with specific IDE (Eclipse, NetBeans, etc) in order to use the full set of its features but the ones who uses the .Net framework have to pay around 500$, which is the starting price for Microsoft Visual Studio Professional 2012. This is particularly useful for range of different user groups from Freelance developers to Startups and even big companies which want to implement low-cost strategy for their development process.[2]

Open source programs are not dependent on the company or author that originally created them. In the case when the company fails or the author is no longer interested in supporting the particular product, the community can continue to keep the project alive and even improve it.

Of course there are some disadvantages of the Open Source programs as well. The most obvious one is that an open source product can tend to evolve more in line with developers’ wishes than the needs of the end user. This could lead to problems for the users. If they do not know how to use the particular product, they will have to either spend some more time on it or to pay for training. In many of the case when the user does not pay for the product itself, he or she pays for the support. If that is not the case then fixing a particular problem may take longer than usual because open source software tends to rely on its community of users to respond to and fix problems. There is also the problem of high maintenance overhead. As the source code is freely available over the internet there’s potential for a constant stream of user suggestions for bug fixes and patches. This turns to lead to flood of patches which makes source code significantly more complex and can potentially lead to structure quality issues. And structure problems leads to higher cost of maintenance. [3]

To summarize open source has its own advantages and disadvantages. Using an open source product may be very useful in particular situations, but in order to happen so one should understand its advantages and disadvantages compared commercial version of it.

You are in the office, it is 8 pm, Friday…

…your screen glares with stormy Jenkins weather reports, you nervously observe another build failing for the nth time. Tests are continuously failing, their descriptions are vague, BVT needs another 3 hours to complete. Most of the APIs were hurriedly documented, there are gaps, some are just copy-pastes, there is rarely a meaningful comment in the code. The guy who did the latest Git merge between branches you are working on screwed some work and made the commit history even less readable; another guy, original owner of the troubling module quit the job few months ago and you are struggling to find a single piece of information consistently describing how it works.

What seemed like an implementation of a simple feature, actually turned out to be more like awakening a monster. A monster made of aggregated technical debt; bits and pieces of poor software engineering decisions that initially appeared working, until today.

And now, you are likely to spend an entire weekend coding through this mess. This might end in two ways… creation of another sloppy, superficial patch and deepening the technical debt even further, or, a Monday disaster. A meeting, where you, your team-mates and managers eventually acknowledge the lamentable state of the project. All try to approximate the impact on deadlines, work out an approach to solve the issue and if it is not already too late – after some time – the project might get back on track, if it is, the project dies.

I have been fortunate to work for one of the biggest global software companies as a software engineer, experiencing the pros and cons of a work in the corporate environment. All I write is based on my experiences and observations, and while the above story is slightly exaggerated to draw your attention, it touches few important aspects of a software engineering process. Aspects that often lack this attention and seem to disappear within a bag of “more serious” development activities, but ignored, can build up over time and bring any project closer to the above scenario.

Software documentation

Not writing a documentation is obviously bad, however, writing it can be done in numerous bad ways.

First of all, poor content quality. When commenting code or writing a documentation is widely considered a boring duty, it is often not paid proper attention throughout the development cycle. It is common to see it postponed until the very last minute of a project and then written hastily together with other clean-up activities. When as little as possible effort is put into this process, the only thing we can expect is rather inconsistent, flawed and unhelpful documentation. If such a habit exists within a development team and is silently accepted, all the subsequent efforts of developers to incorporate the poorly documented codebase in their future work will lead to frustration. Out of this frustration can hardly be born an incentive to improve the situation.

Developers also tend to initially underestimate the value of documentation when they have another co-creators around. They can easily obtain required explanation from them, however, situation gets much more complicated when these co-creators leave the company or change teams. I have seen people who quit the job being desperately asked for last-minute documentation contributions in their ultimate week of work. Well, it certainly was not of the highest quality 😉 Another problem appeared when someone changed teams and former teammates were forced to fight for his time with the new management. It was a cause of many delays and clashes.

Another important aspect of documenting software is the way we store the documentation. While this issue does not touch code comments, it is really important to have a consistent documentation storage policy within a project. I have seen a large, mature project documented using:

files stored on lab’s FTP server,

email conversations,

database entries within a collaboration suite server,

wiki articles,

plan items and tasks within a software life-cycle management platform.

Years pass and some storage methods go offline, there are various migrations and people responsible for maintaining the content change. While having a primary design document accessible was required by the company policy, all the subsequent improvements, fixed bugs and lots of tiny tools used somewhere in the process were documented throughout all these places. This poses a real threat to the project. When daily development routines require you to travel the office in search for people “who know where to look” and to dig through various sources putting the pieces together in order to gather any meaningful documentation, this is a real danger. The project grows, matures and there is a rotation of people. The lack of consistent, long-term and persistent storage policy for the documentation might result in a completely unmaintainable project where every development effort is a pain and poses a threat on stability of the platform.

Software testing

Once again, not writing test suites is bad, but using them in a light-minded way might be dangerous.

When software development work is often connected with tight deadlines and a lot of stress – just like in the opening story of this article – there appears a temptation to “temporary disable” a troubling test case. A developer might comment the whole test body out or just slightly modify its behaviour. Then, he promises himself to fix it as soon as he gets out of the blind alley. It rarely happens… First of all, when you need to disable a test case in order to implement a feature means you do not understand what both things do. You do not understand the purpose and the scope of the test case, as well as the impact of your incoming patch. This is the shortcut to introducing a really bad regression into the project. These walkarounds are going to pile up over time, and generally, the careless developer rarely finds time to fix it early enough. Secondly, if such practice happens exceptionally and relates to just a few developers, it might be easy to curb. However, if this practice is being commonly applied by the whole team, the project is doomed.

Another problem with test suites is their runtime. The purpose of most testing techniques is to deliver meaningful feedback on incoming patch. To empower the software development process and help build a better product, this feedback has to be quick. All the beneficial effects of rich test suites and good code coverage are suppressed by prolonged runtime of a suite. In a project I have worked on, our main test suite guarding the way into the main development branch had execution time of around 10 minutes. In case of this rather large project, it was successfully striking a balance between complexity of testing techniques and their runtime. However, another test suite, a BVT, had runtime of over 4 hours. While this kind of testing has a slightly different purpose, it happened to introduce serious breakdown in the project due to its runtime sluggishness. During final iteration of one critical project phase, build cluster got swamped with workload caused by this BVT suite. Developers could hardly obtain any meaningful feedback from it and all continuous integration systems were congested. The decision was made to disable it. Few weeks later, serious regression issues were found after re-enabling the suite and significant delay was introduced into the plan.

Code review

Finally, the powerful concept of code review. Another useful practice if happens not only on paper.

It your team has a policy of formal code review, requiring i.e. a senior software engineer or component owner to go through your modifications, this is right and really beneficial for the project. What I observed as causing problems is when there is a policy of more informal peer review. That you are only obliged to ask any member of your team to approve your changes. The risky behaviour here is when pairs of befriended developers start to form and they tend to agree on certain deals between them. They begin to assign each other as reviewers of their modifications and there is a tendency of silently accepting everything, without any deeper consideration. Partly due to usually high workload of an average developer, and partly due to plain laziness. Once again, when no proper attention is paid to this aspect of the software engineering process – under a false impression of saving time – great risk is being introduced into the project. It is almost equal to doubling chances of committing faulty code, where two pairs of eyes can hypothetically fish out twice as many flaws.

Conclusion

The aspects discussed are parts of a broader discussion about technical debt and good software engineering practices. The technical debt form an inseparable part of any engineering project and cannot be avoided, it has to be kept under control. People responsible for managing a project should properly trade short-term benefits which deepen a technical debt for well-thought long-term engineering decisions. The same applies to regular software developers, whose actions may reflect an overall strategy of the project or be out of control, forming micro risks which can add up together. As a consequence of poor engineering decisions, project might approach a state of technical death. Such state is rarely recoverable, and rewriting any bigger project from scratch is barely fisible.

Related reading

Introduction

The type system has always been a major feature in a programming language, to the extent that it is at the base of a primary classification of languages.
In fact, dynamically typed languages are often perceived as scripting languages, very productive and fun to work in but unsuitable for large-scale development.
Statically typed languages, on the other hand, are seen as the natural tool for a large project, at the cost of verbosity and complexity.
This article will explore a third approach, specifically aimed at getting the best of both worlds: optional type annotations.

Dynamic type systems: productivity

A language is said to be dynamically typed if it performs the type checking at runtime.
This means that one does not have to write explicit type signatures, resulting in less code to write and quicker development times.

However, this is not the main reason why dynamically typed languages are more productive. In fact, this misconception comes from comparing Ruby, Python et similia with Java and C++, which require an explicit type signature for every single variable.
However languages whose static type system performs type inference are just as concise as their dynamic counterparts, as one could easily see by comparing, for example, Python with Haskell.

What really favours productivity is that dynamically typed languages require much less design upfront to produce working software.
In fact the major designing issue with statically typed languages is modelling one’s problem domain in a way that satisfies the constraints imposed by the type system, no matter how expressive it is.

This really hinders productivity when coupled with the constant, fast paced requirements and design changes that every real world software project undergoes.

Furthermore, some features like reflection, meta-programming and dynamic dispatch are simpler and less cumbersome in a dynamically typed language, as anyone who’s written template meta-programming in C++ probably already knows.

Static type systems: scalability

So, why is everyone insisting on using Java and C++ for large scale software, instead of using dynamically typed languages?
Well, it turns out there is a problem with them: they don’t scale well.

As our system has grown, a lot of the logic in our Ruby system sort of replicates a type system, either in our unit tests or as validations on models. I think it may just be a property of large systems in dynamic languages, that eventually you end up rewriting your own type system, and you sort of do it badly. You’re checking for null values all over the place.

Alex Payne on why Twitter is switching to Scala for server-side code

Choosing the best tool for the job: optional type annotations.

All the flamewars on static vs. dynamic essentially boil down to this: choosing between productivity and flexibility vs stability and scalability.
The problem is that this choice is being made by the language, when instead it should be made by the programmer: she should be able to make a trade-off depending on the specific problem her code is trying to solve.

I believe that dynamic typing should be the default, as the advantages it brings are simply too valuable to give up on. However, there are cases in which enforcing the type a function expects at compile-time is simpler and sounder, without letting this additional guarantee getting in one’s way when it is not needed.
This can be fully achieved by using optional type annotations.

The most mature example of a successful application of this approach is Clojure. Clojure is a dialect of Lisp that runs on the JVM, and is receiving quite a lot of attention lately for its power and expressivity. Being a Lisp, it is dynamically typed, but its core.typed library adds an optional static type system using annotations. For example this function checks whether an integer is divided by another:

The interesting part is that the annotation (the top line) can be added later without modifying the actual code, letting the programmer decide which functions should be left generic and which should not.

Another very important point to be made is that this library has been originally written by an external developer, and added to the language core later, it was not necessary to have native support for this feature. This bodes well for the inclusion of such a feature in other languages like Python or Ruby.

Conclusion

Many dynamically typed languages are powerful and fun to work in, but they are often set aside when it comes to large scale software, due to the perceived unsuitability that the lack of a static type system implies.
An optional system based on annotations however, as we have seen, succeeds at bringing the best of both worlds, allowing programmers to do what is always required from them: choosing the best tool for the job.

Introduction

A big problem nowadays when dealing with large-scale software projects is deciding upon reusing existing code or designing a fresh chunk of code that would do approximately the same thing. At a first glance the decision should clear and code reuse should be chosen. However, there are a couple of strong arguments that back up both decisions.

In this article we will illustrate some of these drawbacks of both approaches and propose a trade-off solution that would improve code quality and time management within a large software company. We will then present the potential improvements in contrast with the challenges that such a solution would pose.

Background

A large project is usually split into smaller, more manageable chunks, which are able to be developed separately and integrated afterwards, with specific requirements outlined at the beginning. This being said, when developing a sub-task of the project, the algorithms/methods that compose that sub-task are usually not new concepts, but rather, a rearrangement of them in order to produce different results.

This being said, when looking at the two options available (reuse or design from scratch), the first one would be the best solution as it should be less time-consuming and less intellectual effort would be wasted on things that are already developed. However, the code that is to be reused is usually developed by another developer and this represents one of the challenges as we will see next.

Developers are very different when it comes to designing and implementing code. Usually the code of another person is harder to read while understanding the whole extent of its functionality than to redesign new code. This is why, code re-usage is often regarded as a more difficult and messy approach and the second choice (designing of new code) is mostly regarded as preferable.

A solution to this problem would represent software maintenance and well-documented code. However, given the probability of a developer to reuse his own code (a low one), little effort is put in making the structure and line of thought very clear. In most of the cases the code will never be reused and in the best case it will only serve for mild inspirational purposes.

We will now present a comparison between the two approaches together with their advantages and drawbacks in order to get a better idea about what can be improved.

The two Rs: Reusing vs. Redesigning

When considering a task it is very common to divide it into small “atomic” chunks and deal with them separately. Most of the software programs that are currently developed have a lot of these “atomic” chunks in common. Re-usage of existing code would save a lot of time when dealing with familiar or already developed parts. However, the code that is to be reused is usually not in a very friendly form and has to be refactored and adapted to the current set-up.

On the other side, redesigning everything from scratch is viewed as easier and more convenient, given the level of concentration needed to identify potential flaws and inconsistencies in the existing code when trying to integrate it with the rest of the project. A new, clean version of the required code would allow a better overview of it from the developer and can work towards better understanding of the underlying structure and hidden advantages. The down side of this is, as mentioned before, time wastage. Whenever redundant work is performed, time is considered to be wasted.

A mix between the two would significantly improve code quality and time management within a large software company.

In-house open source – better code quality

First we will define the concept of “in-house” open source and then proceed with describing the underlying aspects and additional measures that would be implemented in order to produce quality code and encourage re-usage.

This type of open source refers to the source code that is made available within a large software company. We can look at it as an intranet or a private public code pool. The idea is to focus on a smaller group of developers that can be motivated to create code which not only satisfies the project-wise requirements but also is “friendly” enough to be reused.

A database would have to be setup in order to hold all these reusable chunks. We can look at it as a virtual code library where each “atomic” chunk falls into a category and/or has specific tags that make it identifiable with a given task.

Also, in order to motivate the people involved in creation and re-usage, incentives must be provided. We will consider the initialization of such a system and potential evolution.

In the beginning all code will have to be new, in order to secure the quality of it. The chunks that are identified as being general and reusable would be well documented and structured. At this stage, extra work is required from the developers as they have to perform two tasks instead of one. However, once the code library starts to be populated, the advantages of such an approach would start to show. We now consider that the code library has a considerable size. When dealing with new projects, developers now would have to work less than average, given the re-usability of existing code.

In order to set-up such a system some methods of motivating are required. If extra work would not have any advantages the drive to work towards a common goal would disappear. An internal referencing system would solve the acknowledgement issue while a bonus-driven system would address the incentive problem.

Advantages and challenges of the approach

To sum up the discussion presented above we will identify some of the aspects that the proposed approach aims to address and improve together with eventual challenges that it may encounter.

the initial development part is the most important and the most difficult: new code is being created that has to be both functional for the current task and reusable. The second property can prove to be the most challenging part as the code must be read and understood with ease from an objective point of view (another developer)

once the library is set-up, it will serve the developers with good quality reusable code and thus will spare them a lot of time that would have been otherwise spent on redundant development.

the efficiency of the approach is direct proportional with time passage and size of the company.

more time in the long-run would allow developers to focus on the key aspects of an idea/algorithm and deliver better quality code

bonuses awarded to developers that are being “cited” means that they are motivated to produce even more reusable content

when considering the company’s success there are two possible leads: either keep the library private and thus increase efficiency and delivering rate or make the library available for purchase with the trade-off that the advantage is lost.

Conclusion

Even though code re-usage can be viewed as an improbable action when dealing with new projects, given the right circumstances and set-up, it can prove to be a very powerful tool and count towards improving the efficiency of the whole company that promotes it.

Introduction

What exactly is evolution and how do we address it in the context of software? In the dynamic world we live in evolution is needed in order to survive. It is a part of the natural selection where the strongest most adaptable organisms survive the ever-changing environments. So the process of evolution is simply a changing towards the conditions that surround you in the long term. So how do we address it in terms of software? Well if you think about it software is not that different than the real world since humans have made it in a way mimicking real world examples. Such examples are general architectural patterns, abstract classes based on real world etc.

The difference between software and real-world is that software in the past decade has evolved at a tremendous speed. Compared to the non-virtual world where change and evolution happens in terms of thousands of years and adaptation happens at a much slower rate in software we are the ones that determine the growth and evolution of the software. Since the virtual environments change fast software needs to evolve and be made in an adaptable manner otherwise it would not survive the changes. This article discusses software evolution, maintenance explains some rules and models and challenges in the context of its survival in this computer age.

What is Software Evolution?

Software Evolution is a discipline in software engineering that is based on the facts that change is inevitable and software needs to evolve continuously in order to “survive” and to remain functional. It is basically an innovative process of developing and improving the software in accordance to the standards “now”. Since the “now” is always changing the idea is that the software needs to change with the “now” thus resulting in an evolutionary manner of software development.

Software evolution’s main idea therefore to keep the software up-to-date and avoid software ageing, code decay and other unwanted problems that come with software. It may involve re-development from scratch and migrating or adapting software, whereas software maintenance addresses smaller issues such as bug fixes, minor enhancements. Software maintenance is the process of preserving and improving an existing system without making big changes to it and Software Evolution is the process of making big changes to the system’s central design in order to make it work in the new conditions that have come to be within the environment.

The Eight Rules of Software Evolution:

Manny Lehman, the “Father of Software Evolution”, identified eight rules about software’s evolution and describes things that either slow down or makes change within software faster.

The first rule: “Continuous change”. System must be continuously adapted else it becomes progressively less satisfactory in use. Can be easily understood since we know that the software environments are changing constantly and if the system does not adapt towards changes it becomes less and less functional in such environments (eventually becoming a Legacy system).

The second rule: “Increasing Complexity”. Because the system has evolved its complexity has increased and unless additional work is done to reduce this complexity eventually it will become a highly complex system which would become hard to maintain and adapt.

The third rule “Self-regulation”: Since the system is constantly evolving then this evolution can provide a self-regulating system.

The fourth rule “Conservation of Organizational stability”: The work rate of an evolving system tends to be constant over its operational lifetime or phases of its life-time.

The fifth rule “Conservation and familiarity”: That means that in such evolving and growing systems familiarity of the system needs to be maintained. In general the bigger the system the less people understand about its inner workings. So the developers and even the customers need to keep that in mind for such systems.

The sixth rule “Continuous growth”: That means that more functionality needs to be implemented within such systems in order to keep the customer satisfied. That in itself provides an overhead since the more functionality is implemented the more resources would be needed to maintain this. But nevertheless in order for a system to be evolving it needs to add functionality or change its previous towards new conditions.

The seventh rule “Declining quality”: The quality of software is degrading with time if not changed. In itself this software may be the same but the environments, hardware and so many things are constantly changing so previously working software may not work in the new conditions. Therefore its quality may be said to be declining with time.

The eight rule “Feedback system”: Evolution type systems are multi-level, multi-loop, multi-agent feedback systems. So in order to achieve a significant improvement they need to be treated as such.

Challenge Accepted?

Since Software Evolution has many factors to be considered when implementing it then it is logical that many challenges would arise. The most common ones are preserving and improving software quality, management issues, little theoretical background, lack of evolution tools, study experience of software developers and other external factors.

Improving and preserving software’s quality when upgrading it is one of the most common issues that arise in Software Evolution. In order to maintain its quality then its functionality needs to either improve or remain similar. This is hard to do when changing the system. One solution to dealing with such problems is testing before and after the changes and comparing the functionality of the two. But in such case the issue of whether the tests were correct arises.

Management issues that might arise when evolving a system involve allocating proper resources to a task that is hard to evaluate. Determining the need of such a system and explaining it to the upper management. Another issue is whether to evolve a legacy system or develop similar system from scratch etc.

The other challenge is that in school or university when we are given different assignments they are mainly single release and hence most of software developers don’t have experience in maintaining code or implementing an Evolution type system. Whereas in our careers we would mainly be working multi-release software and systems that are much more software evolution oriented than in our assignments and practical.

Software Evolution Models

Some examples of SE models would be the waterfall model that is a 5 step model that has maintenance in the fifth step. The spiral model, that consists of small prototype releases and a linear model of software evolution. It is based on communication with the customers thus achieving the satisfaction of the customer; risk analysis; and customer evaluation of releases. Its biggest advantage is that this model is very applicable in real-world scenarios.

Another method is the staged method is a descriptive method that involves implementing software evolution in the long run. It consists of four stages. First is the initial development where the first version of the system is produced. Second is the active evolution step and this is the step where the system is still in the early stage of development so small and simple changes can be easily implemented and also major re-engineering of the system would not provide such an overhead. Third is the servicing stage, which consists mainly of maintaining an already operational system and this is where knowledge of inner workings of the system may deteriorate. Fourth is the Phase out step which is when the system is no longer operation in the current environmental conditions and is to be replaced by new solutions and making small changes would be too costly to implement.

Conclusion

Software is one of humanity’s most complex inventions and because of this many issues in maintaining and improving software arise. The need for improvement is obvious though. We as humans always strive for comfort and innovations. We are curious by nature, we strive for improvement and in the long run have always tried to invent things that would make our everyday life easier. It is the same with software evolution. It is the discipline of improving and redesigning in order to achieve better results or similar results but in the current situation.

The Perfect Software

This blog post was inspired by an article They Write the Right Stuff by Charles Fishman published in the Dec 1996/Jan 1997 issue of Fast Company magazine. The article describes the software development process used by on–board Shuttle group that writes software for NASA to be run on control computers inside the Space Shuttle.

The publication quotes some impressive numbers.

Consider these stats: the last three versions of the program — each 420,000 lines long — had just one error each. The last 11 versions of this software had a total of 17 errors. Commercial programs of equivalent complexity would have 5,000 errors.

The software is said to have been delivered on time and within budget. If this cannot be called nearly perfect software, I do not know what can.

In this post I will try to argue that these statistics are not something out of the ordinary. Every software engineering project could boast similar numbers. It is just that achieving them comes at a price (both literally and figuratively) which we find to be too hight to pay.

They Write the Right Stuff breaks down on–board Shuttle group‘s process into 4 principles. I will follow the same structure and will explain why — in my opinion — we choose not to uphold them.

1. Big Design Up Front

At the on-board shuttle group, about one-third of the process of writing software happens before anyone writes a line of code. […] Take the upgrade of the software to permit the shuttle to navigate with Global Positioning Satellites, a change that involves just 1.5% of the program, or 6,366 lines of code. The specs for that one change run 2,500 pages […]

Here are the reasons I believe to dissuade some programming companies from specifying their software in detail before any programming begins.

First, designing up front is hard. This process requires experience and a deep understanding of the problem that the software is going to solve. It is hard not only from the purely technical point of view, but psychological as well. I believe that it is fair to liken the process of requirement gathering to mind reading as clients will often find it difficult to articulate their needs clearly. Often it seems easier to start programming and solve any issues that arise once they become apparent. I do not find it surprising that such a patchwork of a project is more likely to contain errors and bugs than one that was carefully designed beforehand.

Second, the client is always right. In this day and age it might be difficult to convince your clients that they absolutely must go to numerous lengthy meetings and fund requirement gathering and documentation process when, in their mind, it will have no tangible benefits. While the civil engineering analogy is growing old, I still believe that it holds true. One would not expect the clients to change their mind about how many rooms they want after the foundations have been laid. Why should it not be true for software development?

Extreme Programming proponents argue that requirements always change. Of course, part of it is that times change and requirements change with them. Obviously, in those situations designing up front is not the right thing to do. However, I believe that more often than not it is a misunderstanding between the client and the developers that causes the changes. It that case, it was a choice not to invest more time in requirement gathering, and thus — I argue — a choice not to implement the best software possible.

Big design up front is obviously much easier for the Shuttle group because they, as the article points out, have a single client with a single project and they have a deep understanding of the problems at hand.

2. Code Review Performed by a Separate Team

The central group breaks down into two key teams: the coders — the people who sit and write code — and the verifiers — the people who try to find flaws in the code. The two outfits report to separate bosses and function under opposing marching orders.

With pair programming on the rise, similar techniques might be applied more widely in the future. However, I still think that pair programming is not quite on par with having a separate team dedicated purely to code review. I think that purely because of my personal experience that one develops a kind of bug-blindness when they are familiar with the code. This is exactly why having a separate team is so powerful — when you are not familiar with code it is easier to spot mistakes.

3. Every Change Is Documented

One is the history of the code itself — with every line annotated, showing every time it was changed, why it was changed, when it was changed, what the purpose of the change was, what specifications documents detail the change.

Times seem to have changed the scene for the better. What was described as something to be marvelled at in 1996 now is commonplace with advances in version control software. With Git and Maven available freely it is purely a question of discipline to match 90’s NASA in record keeping.

This is the only point laid down by Fishman that is a complete no-brainer — there are no drawbacks. Keeping good records does not require any extra time, money or human resources.

4. Learning From Past Mistakes

Like computer models predicting the weather, the coding models predict how many errors the group should make in writing each new version of the software. True to form, if the coders and testers find too few errors, everyone works the process until reality and the predictions match.

The process is so pervasive, it gets the blame for any error — if there is a flaw in the software, there must be something wrong with the way its being written, something that can be corrected. Any error not found at the planning stage has slipped through at least some checks. Why? Is there something wrong with the inspection process? Does a question need to be added to a checklist?

In a fast paced environment it might be difficult to find time to reflect on every single bug and question yourself whether there was anything that could have been done in order to prevent it. Are there any other places in the codebase that could suffer from the same bug?

Conclusions

All of these techniques seem to boil down to good documentation and not sparing time or resources when it comes to making sure that the software is top notch. The simple truth is that bug–free on time software is just more expensive than we (or our clients) are prepared to pay.

I hope that I managed to persuade you that it is not that we cannot write nearly perfect software, it is just that — like with everything in life — we make trade-offs.