On Sep 12, 2007, at 4:30 PM, Kashyap, Vipul wrote:
[snip]
>> In terms of whether you can do this using SQL querying
>> alone, based on our experience, its unlikely. The problem is that
>> the types of clinical exclusion and inclusion criteria we saw on
>> clinicalTrials.gov cannot be easily reduced to SQL querying (at least
>> with the structured medical records we got from Columbia). From
>> discussions with other institutions, we know this isn't unique to
>> Columbia (i.e., there is a substantial "semantic gap" between what's
>> in the structured record and what is being queried by investigators
>> for clinical trials).
>> this information.
>
> [VK] It will be great if you could share specific examples of some
> criteria that
> were not expressible in SQL. We can then incorporate those into the
> use
> case and help make a case for SW technologies. On the other hand,
> taking a quick
> look at the SHER project at IBM, looks like you are using a
> polynomial time
> reasoner (CEL) for the matching. I may be mistaken, but my initial
> sense is that
> any CEL expression is likely expressed in SQL/Relational Algebra or
> vice versa.
Kavitha already pointed out that they aren't using CEL, however, at
least the SNOMED part is in EL++ can be reasoned with using CEL or
the like. However, that doesn't mean you can use (in any sensible
way) Relational Algebra.
If you look at the OWL 1.1 tractable fragments document:
<http://www.webont.org/owl/1.1/tractable.html>
in particular the section on computational properties:
<http://www.webont.org/owl/1.1/tractable.html#7>
In contains the following paragraph:
"The fact that data complexity stays LOGSPACE, means that one can
exploit relational database technology for instance checking and
conjunctive query answering.The fact that data complexity goes beyond
LOGSPACE means that query answering and instance checking require
more powerful engines than the ones provided by relational database
technologies. PTIME-hardness essentially requires Datalog
technologies. For the CoNP cases, Disjunctive Datalog technologies
could be adopted."
The data complexity of EL++ suggest strongly that a sensible
reduction to SQL is unlikely (i.e., you'll need datalogesque rules as
well).
Even logspace data complex logics can be tricky. The DL-Lite family
is the paramount example and they can have an exponential blowup in
the size of the query (since they need to intern parts of the tbox in
the query, so each conjunct might expand, and then the permutations
of the expansions must be added to the union of queries...er...as I
recall :))
So, basically, large queries with large, connectd TBoxes will be
challenging, requiring clever optimization of the rewriting. This
isn't something you'll do by hand ;)
Cheers,
Bijan.