2 Answers
2

Some generalities

I'd first like to discuss some ideas behind DSLs, and why they are useful, and then give a few pointers to some specific examples. The main idea of a DSL (at least the way I understand it) is that, for a given domain, there may be a number of primitive operations, such that all or most other desired operations can be expressed as some combinations of these. The main difference between a DSL and a simple API approach is that in DSL, you can nest these operations, and combine them in non-trivial ways. Some typical ingredients of DSL construction may involve

Parsing and creating an AST (Abstract Syntax Tree) of your code. This part may or may not be present, since in Mathematica we more or less program already in parse trees (Mathematica expressions). One desired property for a parse tree is that it (and heads out of which it is built) is completely inert, so that no piece of it evaluates. For example, in the SymbolicC case, the heads constructing the Symbolic C expressions are completely inert. This is not an absolute requirement, since you may have other means to prevent evaluation, but, at least in my experience, it is often quite hard to control evaluation when some heads in the parse tree are allowed to evaluate. I can see two kinds of situations when creating a separate parse tree (with different heads) may be needed:

The original expression (code) of the DSL can evaluate, and it is this evaluation which actually constructs the parse tree. This may be advantageous, since the original DSL code may be (much) more compact than the resulting AST.

The original DSL code is expressed in the "wrong degrees of freedom", which is, the operations natural for the end user of a DSL have a non-trivial mapping to the primitives, in which it is most natural to implement the desired operations.

For the (mentioned below) code formatter example, both of the above points were true.

Actual implementation of a DSL. This involves implementing the actual interactions between your DSL primitives. If your DSL will be interpreted, this is an interpreter which would execute your AST, if it will be compiled, this will be a code generator which would generate Mathematica code from your AST. Either way, this is a necessary step, which forms the essence of your DSL and defines composition of elements (language primitives).

Frequent use of recursion. Recursion is natural in this setting, because it reflects the nested nature of programs, and composition of primitive operations. I would even consider its presence necessary for a DSL to be non-trivial. To give a few examples, CCodeGenerate function from SymbolicC is deeply recursive, and also in the code formatter (mentioned below), all stages of the formatter heavily use recursion. Recursion is also central in the (also mentioned below) example from the book of Wellin et al.

Specific examples

SymbolicC and DatabaseLink are two examples of Mathematica DSLs which map Mathematica expressions to C code and SQL queries respectively. You can read their source code, it is quite instructive. In some sense, JLink can also be considered a DSL. Also in some sense, the new Statistics-related functionality forms a DSL. For more simple examples, you can look at my code formatter, which implements a simple but non-trivial formatting DSL (although I did not perform the refactoring which would make this totally apparent). Another good example is a simple graphic DSL developed in "Introduction to programming with Mathematica" by Wellin, Gaylord and Kamin (for the purposes of a pure Mathematica DSL, you can skip the parsing stage, or, more precisely, skip the lexical analysis, replacing their custom syntax with Mathematica expression-based syntax). This may actually be the easiest example to start with.

Other ways to implement DSLs

One can take a less formal route, and implement DSL-s through code-generation, achieved by writing macros. I mentioned macros in my answer on metaprogramming, and also mentioned the complications which currently accompany writing macros in Mathematica. Perhaps, the largest one is that there are no true compile and macro-expansion stages, since Mathematica is interpreted. But it should be possible to write a framework, which would introduce these, and then use it. The advantage of this method is that you can figure out your DSL's details as you go, just by eliminating boilerplate code as you see it. This may be a big advantage when you find it hard to formally specify your DSL, either because you are still learning the domain, or because it is not a priori clear what would make a good set of primitives in your DSL.

Tnx Leonid, very useful. I've used DatabaseLink. The functionality of "active documents" should include structured query language, eg as found on Bloomberg Professional Service (aka Terminal), eg Type C US <Equity>CN<GO> to view headlines for Citigroup. This may be the preferred mode for clinical Order Entry tasks. However for Clinical Documentation, the envisioned input to the interpreter is natural language, not structured queries. The output is the markedup document and even interactive help or guidance to the user. Is this functionality feasible with DSL?
–
alancalvittiApr 23 '12 at 18:35

@alancalvitti Yep, sounds like a perfect candidate for a DSL. For the natural language processing, I agree with rcollyer that WA already has a very strong NLP component. The problem is, AFAIK, there are no direct hooks which would allow you to use that component alone. The conversion from natural language to a formal language can be a very tough problem. You may be better off by overloading your formal language so that there are multiple ways of espressing the same - but even this can be a heavy burden.
–
Leonid ShifrinApr 23 '12 at 18:42

@alancalvitti What I would perhaps try is to first make a formal DSL, and then try constructing another layer, also formal, but heavily overloaded, which would compile into a formal DSL. In this way, you can at least separate these concerns.
–
Leonid ShifrinApr 23 '12 at 18:43

This is not a direct answer to you question, but you may be reinventing the wheel. Have you considered incorporating the Wolfram-Alpha engine directly into your system? They have a number of products that can be used to build-out this type of application.

Depends on how Alpha can be extended. Certainly not as an "appliance" - Alpha is a proprietary platform which further relies on a curated-data model. Nevermind the NLP issues - if you search Alpha with the query "1 2 4 8 16" Alpha returns info only on the geometric sequence, whereas oeis.org returns 500+ sequences. From an analytics performance that's a sensitivity of almost zero; unacceptable... With respect to NLP, IBM/DeepQA developed Apache UIMA tools for their Watson system. Does Alpha communicate with these?
–
alancalvittiApr 23 '12 at 18:00

Did you look at their products page that I linked too? Specifically, their enterprise solutions which implies that you can use their tech to curate your data.
–
rcollyerApr 23 '12 at 18:04

Yes I followed the link thanks. We discussed possible partnership with Wolfram Research recently but it would seem to entail Wolfram employees curating data. I don't think that model will work in healthcare. FDA found Ambien spelled 400 different ways, just to given an example from RxNorm. Who's going to normalize all the terminology, Wolfram? Similarly, analytics must be tailored to healthcare domain and require development through the Health IT lifecycle. Maybe this is more a business & licensing issue than tech.
–
alancalvittiApr 23 '12 at 18:16

The curation of the data will always be a problem. But, with one of their appliances on your network, I don't see why they'd be the ones doing the curating, at least directly. Truthfully, I wasn't trying to be facetious with my answer, just trying to save some work (with all the extra work it entails), especially since I've helped build systems with large requirements before.
–
rcollyerApr 23 '12 at 18:20

We certainly would like to see the pieces fit together, open source web services, UIMA NLP tools, and proprietary CKEs. Data models in healthcare are very complex and poorly specified. Wolfram Curation Pipeline includes Normalization, Validation, Crosslinking, Analysis, Linguistics, Expert Review. It's not clear how tightly coupled the data models are to analytics. But would like to talk to experts like you to help refine requirements.
–
alancalvittiApr 23 '12 at 18:46

Mathematica is a registered trademark of Wolfram Research, Inc. While the mark is used herein with the limited permission of Wolfram Research, Stack Exchange and this site disclaim all affiliation therewith.