7 Answers
7

Enterprise Architect from Sparx Systems can forward- and reverse-engineer PHP, although I haven't used this myself.

As to diagramming, my answer is yes - and no.

I would not (do not) use UML to reverse-engineer code bases. Reverse-engineering tools tend to give you only the easy bits (static structures), and even when they do try to provide you with dynamic aspects these are usually incomplete.

But more to the point, I think a UML representation of the source is pointless. Source code is far easier to read; any decent editor will provide syntax highlighting and block folding, and don't forget the significance of the non-source language files (eg Makefiles, IDE project definitions, etc), which UML reverse-engineerers most likely won't understand.

I use UML to document design, especially runtime and deployment aspects: what programs are built and how do they interact? Which ports does the server open? What threading model is used? How is each program configured? And so on. But I only very rarely refer to any source entities in any models.

Presuming this codebase is sufficiently large, I probably would not diagram it.

If the code is too tangled to grasp already, I'm not sure a maze of boxes and arrows will suddenly bring light to it. I've found that a UML tool lacks the context to give you the big picture you need to start. You could spend hours or days trying to understand a portion of the diagram that is truly inconsequential to the whole design, just because it seems like a good place to start.

The first tool I would grab in this situation is exploratory testing. Treat the system as a black box that you don't understand. If you're going to make substantial changes, you need to know what outputs are generated from a given set of inputs. Once you've locked that down you can also use these tests to verify your design moving forward. Exploratory testing gives you a clear way to divide and conquer your way through the codebase. Once you have some tests in place you can certainly build UML (or reorganize generated UML) to provide a basis for team understanding.

After getting some idea of the ends of the system, I would move on to code review. My preference here would be code, but if your team is easily distracted by syntax and style issues you might find generated UML to be more productive. Regardless if you work from code or generated UML, I find working with a partner with a projector and lots of whiteboards will move you to an understanding faster than staring at the screen alone.

You can not really reverse engineer PHP code to UML.
Reverse engineering is usually done from C,C++, C# or Java to UML.

I had in the past inherited of a huge java code project only including java note and a little printed documentation. We have never been able to discover everything because no previous team was still working in the company. This is exactly what should not be done !!

I am now working as a consultant and help companies to put in place agile processes. What I use is UML class diagram only because easy to understand by all the team and also because incremental. I mean that a code change is immediately updated in the UML model. I can trace model change using the local history. If all the developers leave the company it will always be possible to quickly refactor the project because the UML class diagram is very very detailled. We have one or more diagram per packages. Each important method is explained not only in the code but also with class and sequence diagrams. etc..... I fully understand that model driven is not wanted by developers because they feel they can be replaced by other offshore teams. Having said that the managers should always be in control of the project and not depending on code manipulation.

This is my penny for today but I still consider that developers should be protected and not used as meat by companies. The problem is the level of education and training investment of the team. A developer should think and not just code. He/she should also be able to make advance architecture? Finally if he/she leaves the company it should be possible to immediately restore and refactor existing projects.

It is possible to reverse engineer PHP code to UML (i.e. Class Diagrams). But usually you can't get as much information as from e.g. Java code. In PHP classes member variables commonly have no type in the definition (even though it is possible with recent PHP versions). So a static analysis can't tell which other classes are referenced (even for a developer it is quite time consuming to tell). So the diagrams you are getting "off the shelf" are showing very little dependencies.

Our company is doing reverse engineering with UML Lab which allows to add project specific templates and rules for analyzing a software. You can try it yourself (with the 30-day trial). And the support team is really responsive ;)

Generally a good IDE (with or without UML) helps a lot in understanding PHP software. I would recommend to at least use Eclipse PDT (Open Source) or the Zend IDE (commercial) to get a grasp of the software internals. You can then draw Diagrams of the Software yourself and/or improve the generated diagrams step by step.

You're very optimistic. If you code base does not already have proper documentation and isn't so easy to read, I doubt that it has an organized, structured hierarchy. So UML might just make everything seem like a big mess (which it might be).

Software Engineering Radio covered this topic in Episode 148: Software Archaeology with Dave Thomas (of Pragmatic Programmer fame). The podcast had some insightful ways to get a handle on foreign code bases which included looking at the code in tiny fonts in a full-screen editor, walking the code in a debugger with assertions, and being careful of comments (which may no longer be accurate). I can't recall if he touched on tooling to analyze the source, but his ideas struck me as (well) practical.

Diagramming can help because you need to actively engage in the code. Actually everything which requires you to read the code and synthesize something out of it helps. It can be some UML diagram or it can be just a sketch with some bubbles and arrows.

Another way to get grasp of inherited software is to poke at it. You can add logging messages or fool around with a debugger.

There is one technique I like most, because it combines both approaches where can synthesize something new and poke at it: Trying to fit the code into some test harness. If the code base comes along with some unit tests, then you are lucky. You can read the tests to understand the productive code and add some more tests to actively interact with the code basis without harm. If there are no tests, then you need to start somewhere. You need to poke at the code and synthesize something which is much more valuable than diagrams: Tests which are in sync with the actual code and gives you a safety net for the future changes.

Knowledge about unit testing, refactoring techniques and software design is quite handy for wrapping tests around the existing code basis. There a lot of books for each isolated topic, but there are very few which deal with all three topics in the context of an existing code base. I found following two books very helpful: