Reverse dependencies#40

Labels

Milestone

Assignee

2 participants

This feature was implemented but is currently turned off because it used to much memory. This should be investigated, the data structured adjusted and the feature re-enabled. The number of reverse dependencies should be an important component of a package popularity/quality metric.

The main problem with dependencies is that there are just so many of them! So, the implementation allowed you to answer these questions for a given package very quickly:

/package/baz-1.0, /package/baz/reverse/summary

How many different packages have some version which depends directly on some version of baz?

How many different packages have some version which depends directly on baz-1.0?

/package/baz/reverse/summary

How many different packages have some version which depends directly or indirectly on some version of baz?

/package/baz/reverse/all

What are the different packages which have some version which depends directly and indirectly on any version of baz, and how many direct and indirect reverse dependencies do they each have?

/package/baz-1.0/reverse

What are the packages for which the latest version depends on baz-1.0, and what are those respective versions?

/package/baz-1.0/reverse/old

What are the different packages for which the latest version does not depend on baz-1.0, but some older version does, and what are those respective versions?

/packages/reverse

What are all packages which have direct reverse dependencies, sorted by the number of direct reverse dependencies, and what is the number of total (direct + indirect) reverse dependencies they have as well?

Some of the problems of the implementation:

All flags-controlled alternatives are allowed to be true simultaneously so no dependencies are missed-out on (see getAllDependencies).

I am certain space leaks abound.

Because ReverseIndex is an acid-state component with frequent updates, these updates should take in as small arguments as possible to keep the logs as small as possible, which means that it is better for operations to depend on internal state rather than passed-in state.

A complete map from a depended-on package baz to the VersionRange of every version of every package with a dependency on baz. This is in case a new version baz-1.2 gets uploaded, so the list of reverse dependencies can be created for that baz-1.2 (if someone depends on baz-1.1.*, they get excluded from this list).

A complete map from a depended-on package version baz-1.2 to every version of every package which depends on it. For example, we not only know that foo depends on baz-1.2, but foo-4.0, foo-4.1, etc. This is used for checking if the latest version of foo depends on baz-1.2, or only an older/deprecated version.

A complete enumeration of all packages and versions, duplicating the main index. This is in case a package foo-4.4 is uploaded which depends on bar <2.0, foo-4.4 should be added as a reverse dependencies to all versions of bar <2.0 in the index. The only way to get a list of such versions is to have a list of all existing packages.

The structure is complex, but it allows somewhat complex questions to be answered. For example, which packages depend only on an older version of some package, so they can be notified to upgrade to a newer version to help mitigate binary incompatibility? This could be detected for a package candidate. Or, which packages depend on some package only in their older versions, meaning that dependencies are being dropped over time (and may not be useful for determining popularity here)?

As an out-there idea: there are conceivably other services that are just as complicated as revdeps that Hackage might want to offer, like finding similar packages using dependencies, analyzing build reports to make diagnoses, or suggesting tags. Ideally, smart clients could handle as much of these as possible, with the server only providing the UI for them. Can this be done with revdeps?