extensional is a macro for dynamic + multifile. Also asserts facts in metamodel module allowing introspection, saving etc. Still some repetition.pldoc comments. harder to extract metamodel info. more typing would be good.metamodel directives don’t do much : graceful failure when no data. i/o

same code for both in-memory and rdb – amazing!! v powerful. non-recursive only. choose when to swap out prolog store and use rdb.with recursive clauses, can choose to bind only the fact predicates

Transcript

2.
OutlineBiology and biological data integration: a brief introductionObol: First experiences applying LPBlipkit: a reusable bioinformatics developer’s toolkitModular structureI/O and relational database connectivitySome applications of Blipkit and LPGenes and genomicsPhenotype matchingWeb applicationsConclusionsWhere next? Some recommendations for the LP community

3.
The promise and challenges of biological researchWhy study biological systems?Because they’re fascinatingImprove healthImprove the environmentBUT: Biology is hardBiological systems are extremely diverseBiology deal with phenomena at multiple levels of granularityThere is a deluge of dataBioinformaticsBiology as an information scienceComputational methods vital to understanding

9.
Data interrogation and discoverySample of tasksFind mutations in regions upstream of neurotransmitter-producing genesFind drug targets or animal models for neurodegenerative diseasesWhat biological pathways are enriched in high acidity environments?Answer each of these is difficultManual aggregation from lots of databasesVarious kinds of inference required

12.
A better solution: Definite Clause GrammarsObol: A collection of domain specific DCGsSignificant improvement over perlRegExsDeclarativeMore expressiveIntegration with simple reasoningBi-directional:can be used for term generation from logical expressions

20.
ResultsObol grammars applied successfully to generate axioms for multiple ontologiesparticularly the Gene OntologyStill used frequentlyLessons learnedSmall amount of basic LP goes a long wayLP techniques not widely known in bioinformaticsDifferent LP systems have different strengthsChoosing between them is hard – and frustrating

33.
Anatomy of a blip domain packageModel(s) of the domaindependencies to other domain modulesextensional and intensional predicatesI/Oparsers/writers for small subset of bioinformatics file formatsDCGs or external perltranslators for common XML schemasNative prolog serialization of model ‘for free’ Web UIBridgesRelationalOther prolog modelsOntology models

35.
Example from systems biology model%%reaction_modifier(?R,?P) is nondet% relation between a biochemical reaction and a molecular constituent that plays a role in the process but is unmodified:- extensional(reaction_modifier/2).% --- INTENSIONAL PREDICATES ---%%derivation_link(?Input,?Output,?Via)% two species directly linked via a connecting% reaction (excludes modifiers)derivation_link(Input,Output,R):-reaction_reactant(R,Input),reaction_product(R,Output).%...[snip]…:- module(sb_db,[ reaction_product/2, reaction_reactant/2, reaction_modifier/2, derivation_link/3, …]).:- use_module(bio(dbmeta)). % metamodel%%reaction_product(?R,?P) is nondet% relation between a biochemical reaction and a molecular constituent produced in the reaction:- extensional(reaction_product/2).%% reaction_reactant(?R,?P) is nondet% relation between a biochemical reaction and a molecular constituent that is consumed in the reaction:- extensional(reaction_reactant/2).

41.
Genome inferenceDeluge of genomic dataCost per genome decreasingSoon we will all know our genome sequenceBut what does it mean?Effective use of genomics data relies on deductive inferenceMany rules are logical: genome calculusCurrently encoded using ad-hoc imperative codeProbabilistic inference also usefulBut must be built on top of the logical inference

Formalization of gene expressionGenome calculusoperations on linear sequencessubsequence, join, translateCertain sequence types are entailed by other sequencesCalculus is surprisingly conserved across all lifebut biology is fuzzy and full of exceptionsArchaea utilize different translation tableNematodes add trans-splicingMammalian introns are hugeMany genes are co-transcribedViral genes overlap in different translation frames…

54.
Disjunctive datalog implementationAdds:ConstraintsDisjunctions in rule headsImplementationDLV-Complex : allows functions in argumentsProgram written from scratch: Rules must be ‘safe’ResultsScales over small regionsUseful for detecting inconsistencies in dataMore research neededMore efficient programsUse of relational database backendFurther exploration of ASP semanticsGenomic rules have many exceptions

55.
Prolog implementationRemoves:rules that cause cycles with backtrackingImplementationOptional use of Nested Containment List library (C + SWI FLI)ResultsResults can be incomplete due to missing rulesE.g. intron :- exon, but not exon :- intronRuleset can be tailored for datasetScales over medium sized datasets

57.
LP for genomics: conclusionsNo one paradigm is perfectMany axioms cannot be expressed in OWLbut tools are goodDisjunctive Datalog good for consistency checking in small regionsMore research required on efficiency of tabling solution, ASPsWAM solution most efficientManually rewriting programs is tedious!Hybrid solutions usefulRDBs for asserted facts

58.
Application: match.com for diseasesOrganisms have phenotypescharacteristics under the control of the genes of that organismRelated genes can have similar phenotypic effectseven when the least common ancestor of the gene is 500m years agoFinding these genes can help understanddiseaseevolution

66.
Web Applicationshttp://berkeleybop.org/oboWeb interface to Open Bio OntologiesImplemented in perl + SWI-PrologPrototype for future developmentSWI-PrologProduction version in perl and/or java

67.
Experiences using LP for bioinformatics: conclusionsA little bit of LP goes a long wayThe theory-application gap is largely untappedA variety of LP paradigms are usefulASP, datalog, DLs, prolog, ILP, …Interoperation can be hard!LP for ‘real world’ applicationsIt is possible!Declarative approach arguably superiorWeb/database applications are a sweet spotWe need to show more success stories..and to dispel myths

68.
Recommendation: make it easier for usersDocumentation:Unify community knowledge in a single wikiCreate a general LP mail listc.f. OWL/SemWeb communityTools:Program analysisLint-like tool for tabled prologs, ASPVisualizationLibrariesCPAN for Prolog

69.
Recommendation: make it open-sourceWhyEncourages collaborationBioinformaticianslove open sourceThe people who fund bioinformaticians love open sourceOpen source can still generate revenueHowDeposit code in open source code repositoriesgithub, sourceforge, googlecode, etcEmbrace Web 2.0blog it, put it on a wiki