Meta

Debian Dependency Map

December 16, 2008

Several of us use Debian GNU/Linux and among other features we have all appreciated the way Debian software package dependencies are calculated to give us a stable package. At gnowledge.org we harvested the data from the packages.tgz file of Debian 4.0 and created a program that graphically displays the dependencies, called dependency maps. Search
for your favorite package and see how dependent a software is on other packages.

One of the main reasons why we are interested in Debian dependency maps is that we are inspired by that model and now wants to create a similar knowledge.tgz file for all concepts and activities as gnowledge.org is in the process of creating a gnowledge distribution. You will also be able to see how we are using the idea of Debian for knowledge domain.

gnowledge.org is waiting for all of you to spread the message,
specifically among the teachers of any subject whatsoever to add the dependency relations so that we can soon have a gnowledge (free knowledge distribution) distro.

Robert Lemmen and I implemented a similar tool named “debgraph” for Debian as part of GSoC 2008. We focused on identifying dependency cycles and producing graphs that showed the packages involved in each cycle. Analysis tasks are performed via Lua scripts that use the debgraph API.

Interesting! I will check that very soon. Can you consider writing an algorithm for cleaning the redundantly asserted dependencies? Most of the graphs will also look beautiful if the redundant links are pruned.

Cool. We have used the same dependency information to analyze dependencies by the way and written and article about the distrivution of dependency frequencies. It was published in Phsysicl Review letters, a summary can be found in this physorg entry.
We are continuing to analyze the data and will publish more on that. Just FYI.

Thanks for your information. We are not aware of your paper. We did a non-linear fit to the data. Please check the abstract version at http://www.gnowledge.org/scalefree. A full version will appear shortly. We will cite your work. Thanks again.

I’m interested in obtaining the graphs for some packages from the “experimental” distribution. Can you make available the software used to transform the Packages files into the input files for plotting?

Thanks John. We will use the tool “tred” and commit at the gnowledge.org asap. But, this will solve only the graph. I am suggesting that this reduction should happen within the packages.tgz file. Irrespective of what the package maintainers of each package assert the dependencies, we can generate a ‘reduced-package.tgz’ that contains the required assertion by eliminating the redundant ones. I will contact the debian packagers soon.

Martin, we are very soon going to harvest all of debian, from the start till date, and will include all categories of each release. This will be available permanently.

Just a note from a maintainer’s perspective: “pruning” redundant edges might give you a graph that is nicer to look at, but loses some important information.

For example the reason I’m interested in this, is because of one of my packages (ekg2, currently in experimental). It is an instant messenger, with a ton of plugins, and (consequently) a ton of dependencies. It needs to be split into smaller packages, so that people which (for example) don’t use its GUI interface don’t need to install X Window.

I need to find a good balance, which is somewhere between:
– the current situation (single monolithic package that depends on half the archive), and
– its opposite (every plugin split into a separate package, which guarantees that you never have to install a dependency that you will not need, but with the downside of bloating the archive with a dozen of “ekg2-foo”-like packages).

I hope, that by looking at the graph, I will be able to identify clusters of dependencies, and split the package in a near-optimal way.

This means that if some edges are omitted, I will miss some important data needed to solve my problem.

Another thing is the size of the packages. It would be good if you could include some metric of “transitive size”, defined as “size of the package plus size of all its dependencies”. I guess that it would be easy if not for dependency cycles.

Hi webmaster – This is by far the best looking site I’ve seen. It was completely easy to navigate and it was easy to look for the information I needed. Fantastic layout and great content! Every site should have that. Awesome job