User login

Navigation

Types from imported modules "leaking" into a module's public interface

In my ongoing inquiry into module semantics and language treatment of module systems, this seems like a an extremely broad "problem" insofar as 1) it may easily and often infect module designs and implementations; 2) it can create a host of implicit dependencies upon a module "user" (or "importer", etc.); and 3) might be addressed in many ways, both in terms of module design and language design.

In particular, I can imagine a language where a set of "public modules" S(M), whose types "infect" the public interface (arguments and return types) of a new module M1 which imports/uses S(M), are automatically "imported" into any other another module M* that uses or imports M1. Thus S(M) is automagically imported into any such M*.

I can imagine another design where S(M) or a subset of S(M) must be explicitly imported into any M* using/importing M1 for M* to compile and function properly.

I imagine there are other treatments of this issue as well.

I also imagine there are discussions of (1) attempting to reduce the number of such dependencies between M* and S(M) via a module use/import and (2) perhaps more theoretical discussions about the relationships between modules that have or lack such dependencies (which I think will be very common minus efforts to prevent them.)

I would be very grateful for any pointers to the more or less accepted and received wisdom on these issues or to any particularly innovative or insightful treatment of these issues.

I have been thinking about similar problems. I don't claim a "cure" but my thoughts so far are:

1. I observed that dynamic languages do not need to export S(M) names from M1 because argument type checking (if it exists) is performed at runtime and therefore in the context of M1 and there is usually no type checking on binding of names in M* to return values of function in M1. If M* wishes to create values to pass as arguments they must import all or parts of S(M).

2. For a static language, checking of function arguments still occurs in the context of M1, so the compiler knows the types imported from S(M) without them having to be in the interface.

3. Creation or manipulation of values of types from S(M) in M* can require importing of S(M), so the only problem is binding names in M* to return values from functions in M1.

4. In a language with type inferencing the name in M* need not be declared by the programmer, so the type need not be visibly exported from M1, it only has to be available to the type system. So the implementation must export information visible to the type system but outside the namespace visible to the user in M*.

5. The type system must be able to infer that a type visible in M* through importing S(M) explicitly is the same type as that in the hidden export from M1 from importing S(M) there, and the traditional module qualified name seems perfectly acceptable.

I know people hate it, but I'm increasingly of the opinion that the "right thing" in terms of module imports is to have the importing module explicitly list all of the bindings (including types, etc) that it's importing. It's verbose, but it's easy to generate, easy to check, and its semantics are trivially simple. Also, it provides a good "grep target" when people want to discover where the definition for something is.

The more conventional "use module" type statement is far more opaque. Implicit-import issues like this require complicated rules to govern, and that requires people to think in terms of complicated rules. People are, to put it simply, bad at that.

Also, it provides a good "grep target" when people want to discover where the definition for something is.

I'm holding a talk on Europython 2011 about using grammars+parsers to scan source code. This sounds just obvious but only in the context of compilation, not in the way of specifying simple search targets and writing scripts to scan for dedicated information for which regexps are currently used. Also full language grammars are not required. So I'm basically motivate people to use adequate tools.

It occurs to me that with regard to modules, types provide a kind of glue between module exports/imports.

What types "should" inhabit the default, global environment within which other types - and the modules that love them - are written? In C we have some fairly minimal set of types in the global environment: int, int[], char* and so forth. In other languages we might have far more types in the global environment: 'a vec, 'a list, 'k 'v dict, 'a Option, 'a 'b Either, 'd -> 'r, ('a,'b) - and so on.

Something of an old timer, I'm a bit prejudiced toward a minimal set of types inhabiting the global environment. But I can also see that a far richer set of types can limit coupling between modules and the sometimes cascade of inter-module dependencies this can create (complicating and inhibiting reuse, etc.).

In your case, the smaller the number of types inhabiting the implicit global environment, the more explicit imports one must write down on the "imports" side. Curious as to your thoughts.

I'm currently designing a module system more along the lines of parameterized import-by-API with assertions and preferences, basically serving as a weak form of code search. An interesting property is that the linker becomes a constraint solver whose answer is a tree of modules. (Assertions and preferences can introduce constraints across parameters and exports, and parameters to one module can be exports from another.) This enables an interesting stone-soup approach to development: you toss a bunch of modules into a vat along with the 'main' ingredient, and out pops a configured application (especially if some of the modules represent system resources).

That sounds a little similar to the code distribution system that dropped out of my work on automatic theory dependency management. My units of code distribution, which I call packages, consist of arbitrary bundles of definitions. The package system is just responsible for identifying (both in the common and mathematical sense) shared values. It's very simple because I don't ever have values that are in need of linking. If you have a declaration of something, but no definition, then you're implicitly (but formally) parameterized by the implementation. I first noticed how easy it was when I looked at Gilad's 'Ban on Imports' and realized that his complaints don't apply to my setup.

Jiazzi leveraged explicit parameterization and binding at the module level while leveraging by-name package binding for individual atomic unit implementations. We even provided some renaming mechanisms later on so that the module system could customize the namespace of implementations, it worked very well for Java.

Code search to me is very much a tool problem. A type system can support it somehow, but your linker shouldn't be making any fuzzy decisions about what code to use--unless you are doing some interactive exploratory programming, that could be really cool.

The idea that modules could configure and assemble themselves is not that new, I've seen some work on chemical reaction inspired PL (we must have talked about this before on LtU), but its still way out there. Better to help programmers make good choices, but still leave them to make the choice.

Developers need control, stability, security, and simplicity. The approach I've developed is precise if you need it to be, likely deterministic, capability-secure, and very content driven. I think it's simple, too, but I suspect other developers will get stuck on the radical departure from tradition.

'Fuzzy' doesn't seem the right word to describe such an approach, but I'm certainly aiming to leave some slack in the specification. Forcing developers to make a choice when they want to say: "I don't care" is just overspecification, and quite fragile.

Parameterized modules do solve many issues - they enable multiple configurations of a module, and support configuration from within the language. Unfortunately, if used directly, parameters expose all of a module's dependencies, which means you would be unable to decompose a module into smaller sub-modules without exposing the sub-module structure to clients.

To solve that issue you might generate a table of modules - as the parameter to other modules. A module could choose its dependencies from the table. But that raises the issue of: how do you search this table? how does a module find dependencies appropriate to its configuration? Is it easy on the developers? If modules are selected from this table by name (e.g. attribute name) then in general a module would fail to discover a 'better' implementation of a module if one is provided later, or if some versioning issue broke the feature they depended upon before.

That ad-hoc 'table' is the element I'm targeting in my 'parameterized import-by-API' approach. It gets replaced with a matchmaker concept, and standardized as part of the language.

There is one 'import' per for each module or resource requested. The Variable in the 'import' identifies the matchmaker. A matchmaker is generally associated with a registry of resources and policies or filters for favoring some modules in the registry before others. Matchmakers are capability-secure - a module may obtain its initial matchmaker only by parameter, but can 'import' refined matchmakers from existing ones (e.g. to attenuate resource discovery, enable less-trusted modules, remap symbols, or compose with another matchmaker).

The module's exports and parameters are in the first line: 'module' Exports '<=' Parameters. The Exports pattern can contain variables that are defined or imported later in the module. The initial matching filter is covariance in Exports and contravariance in Parameters, and strict matching of any symbols. After that, we can filter by assertions, preferences, and type safety.

I expect modules to be relatively fine-grained, and I have an idea on how to support user-defined syntaxes (via syntax modules). Developers will probably learn plenty of cute idioms, such as 'prefer 1 false' to create a fallback module.

Ok, your modules are obviously not my modules...people say module and they could mean function, class, object, package, DLL/assembly, or whatever, its a very useless term in that respect. So let's just ignore the term "module" and think about namespaces and constructs in those namespaces.

I've thought about this a lot in the context of my code wiki work, where essentially everyone shares the same flat namespace and dependencies are resolved via search. Again, the goal is to help programmers find things, and not to make decisions for them. There are also reasonable defaults (default bindings) that you can enhance to include bindings to other objects in the environment (SuperGlue could do that with declarative rules). To support programming bricolage, we might want to make some decisions automatically; e.g., I say "I want a zombie" and the system infers (based on existing zombie implementations in the code wiki) what the zombie could be for its underconstrained attributes. The system becomes fuzzy here, but it can help speed up development as the user doesn't have to define what a zombie is right away to begin testing and refining.

I still would like to see concrete examples of where your matchmaker would be useful. I would start there rather than explaining the grammar and semantics and letting us infer the examples ourselves.

everyone shares the same flat namespace and dependencies are resolved via search

Technically, I don't believe I'm using a namespace for modules. That is, you cannot identify objects in the space by name, therefore it isn't a namespace. The approach I'm pursuing is content centric. However, different matchmakers may provide content, or different views or search-policies for the same content.

I'm also not using a 'flat' space. I ended up rejecting that for security concerns. There are two security constraints I want to maintain: (1) more-trusted modules will not depend upon less-trusted modules without explicit allowance. (2) less-trusted modules will not have access to sensitive resources or information without explicit grant. One could presumably achieve these properties using some sort of cryptography and signatures techniques, but I've since found a much more elegant and simple, capability-oriented design based on trusted paths through a graph of fine-grained spaces.

I don't really know your code wiki work (a link would be nice) but I'm certainly interested in the concept of a wiki-based IDE. Rather than a global space, I sort of imagine a bunch of smaller wikis mashing up together through a combination of DVCS and relationships between module registries.

the goal is to help programmers find things, and not to make decisions for them

My goal is to support developers in expressing and achieving their requirements and preferences. I believe that if we don't express our requirements, we tend to forget about them.

You suggest using an IDE tool to 'search' for code, with the developer choosing one (by name) from among the provided options. Unfortunately, this means that the reasons for that choice are lost to future generations of developers (including yourself, inevitably).

As the codebase changes, the reasons for your decision may become invalid, or someone may add new options that are more to your preference. However, these changes occur silently, because the system cannot recognize these issues, and thus cannot alert developers of them.

The approach I take is: you write your search, requirements, and preferences directly into the code. You are basically telling the system how you would make your choice. The system then follows your formula, and chooses your choice for you, if you've expressed your requirements adequately. The most likely issue is that you underspecify but it works okay because there's only one preferred option that meets your specification! And, in that case, a good IDE could at least alert developers if ever the ambiguity actualizes.

You could always cheat my system, e.g. by 'requiring' a module that exports a certain guid. That gets you fairly close to a namespace. So it isn't as though you're losing precision with my model.

Have you been considering the effects of code change over time, especially in large-scale multi-developers systems (like a wiki), and how that affects the issue of "making decisions for" the programmers? I.e. if you asked for zombie bob for his sardonic humor and dry wit, but those attributes were transferred over to zombie joe (while zombie bob became a money-grubbing brain muncher) then your program has essentially changed on you *outside* of your choice.

The issues I describe above occur for just a single decision point (which module do I choose here?). When we start composing modules, the reasons for one choice over another are easily invalidated by very subtle changes not just in the target module, but also in any module that it depends upon or that influences its parameter list.

Code-search in the sense I'm using it is considerably more expressive than the IDE feature; i.e. if a module exporting a 'zombie' depends upon a a 'humor' module and a 'wit' module, then a code-search could potentially 'find' this configuration for you, in a similar sense to a grammar or logic program search. (Using parameters only, you would need to anticpate that your zombie module needs wit and humor modules, and then you'd need to trust that the zombie actually uses the modules you provided.) Perhaps this greater expressiveness is what you consider problematic. I imagine it could be an issue if it wasn't guided by requirements, preferences, and security constraints.

I don't foresee much need for configuring zombies, but configuration of dependencies and policy injection are the big reasons I developed matchmakers. With this module system, they'd just be used statically as well as dynamically.

When I think back on the decisions I've had to make in choosing between various libraries, I can't imagine formalizing the logic that went into the decision (and forget about automating the decision). Factors like price, quality of support, and trust (estimated fuzzily by considering the reputation of the author, how many teams are using it, etc.) are going to be very difficult to model.

Do you have in mind that somehow all the Zombie providers will be implementing compatible APIs and that the only differences between Zombie implementations will be in easily quantified dimensions? What if Zombies not only differ in price, but in pricing model? This looks like a hopelessly hard problem to me (if I understand what you have in mind).

Do you have in mind that somehow all the Zombie providers will be implementing compatible APIs

No. Import-by-API means a module that fails to provide a compatible API simply will simply be filtered from the search.

There are two initial easy-to-index filters: (1) the module exports API must be a superset of your requested imports API, (2) the module required parameters must be a subset of the parameters provided at import. Beyond these filters, assertions and preferences and general type compatibility can further filter 'configurations' of modules, but those filters are generally going to perform a lot slower.

The matchmaker also serves as a filter, basically controlling the 'space' of modules one is initially searching (and what other spaces can be discovered). The matchmaker is where you inject any filters on non-functional concerns such as source, price, stability, reputation for quality or performance. The matchmaker also carries 'global' policies that must penetrate layers of modules, such as favoring GTK over QT, or a desire to use truetype fonts if ever the option arises. There isn't just one matchmaker, though - e.g. given a matchmaker, you can 'import' another matchmaker to tweak policies or open a search space or filter results. (Though, you can only open search spaces that were known in the current space, due to the ocap model security.)

The 'formalization' really stops at the filters, standard matchmaker protocol, common structure for content, capability security. I don't formalize content, trust models, pricing models, reputation; those would need to be handled at the IDE layer, since the IDE is responsible for providing the matchmaker, providing the databases of modules, modeling system capabilities as modules, et cetera.

In this case, you would only be able to find this zombie module if you provided a matchmaker to argument 'mm', and a wit parameter. With the 'provides:' field used above, you can see how to use some constant symbols in the imports lists to easily filter modules by their exports.

Anyhow, import-by-API basically means we only import from things that are compatible with the model we're expecting to see, and we try to filter down to a promising set of modules ASAP before we start the more arduous process of model validation (via assert, prefer, and type safety).

Factors like price, quality of support, and trust (estimated fuzzily by considering the reputation of the author, how many teams are using it, etc.) are going to be very difficult to model.

Trust isn't something you can model in absolute terms anyway; it's fundamentally relative to each observer. That 'relativity' is captured by indirection through the matchmaker: which sources does this matchmaker trust? As noted above, the IDE is responsible for providing the initial matchmaker(s), sort of a powerbox pattern. Access to external resources, annotations of trust, that sort of policy would be expressed by humans in a manner the IDE understands and can impose upon the matchmaker for a project.

I have ideas for market integration based on attaching an 'e-purse' to a matchmaker. Costs and licensing issues could certainly be yet another constraint. But this isn't something I'm focusing on at the moment beyond a few thought experiments to ensure I'm not blocking myself from handling it in the future. My general position is that non-functional properties are handled by the matchmaker and good IDE integration.

I'm having a hard time understanding what the problem is. You're saying you don't know whether the modules M* should explicitly or implicitly import the modules S(M)? Isn't this just the question of whether to explicitly or implicitly import the transitive closure of dependencies? (Granted, you're considering only those modules M1 whose interfaces mention the types of S(M).) Sure there's a design decision to make here, but is there a problem?

Certainly in other contexts this introduces real problems -- consider named, versioned package management systems like Hackage/Cabal. Suppose package P's dependency on QuickCheck-1.0 is internal in the sense that no QuickCheck types appear in P's interface. Similarly, Q internally depends on QuickCheck-2.0. Now if some package R depends on P and Q, naÃ¯vely Cabal sees that R (transitively) depends on version 1.0 and on version 2.0 of QuickCheck, which it can't resolve to a single version and rejects. Conceptually there should be no problem, however, since the type system dictates that P and Q never emit or accept QuickCheck-typed values and thus no QuickCheck-1.0-typed value will be used in a QuickCheck-2.0-typed context (and vice versa). Here, the package system conservatively prevents a well-typed program because it cannot distinguish whether or not "imported modules'" types bleed into the "importing module's" type.

This issue relates to the traditional typing rule of `unpack` for System F existentials: the code that "uses" some existential term (module) cannot have any types defined by that module appearing in its own type. cf. (AB.3) in Abstract types have existential type (Mitchell and Plotkin, 1988).

A little Googling turned up design terminology I haven't hear much of in a few decades "coupling and cohesion." Brought back many memories, and part of my inquiry I then realized was as old as dirt (in PLT years).

I also realized that I was asking if there are current "linguistic" mechanisms to address the "coupling" problem.

A new, related question I have is that if the number of types that inhabits the language's global environment will necessarily (or pragmatically) reduce "coupling" between modules. As there must be *some* type relationships between terms and types between modules, is adding ('a list), ('a vector) and ('k 'v dict) [or even, say, double or string], say, to the global environment instead of relegating them to separate, language-level modules going to decrease "coupling" between modules?.

I'm happy that I have some suggested reading with regard to alternatives to traditional module "imports" and their possible transitive dependencies (if I have my terminology straight). But I also have some admittedly vague questions somewhere between module/library design and language or module/type design about what I guess I can now better characterize as progress in the "module coupling problem" over the last 30 years or so.