use Harvester to retrieve additional citation meta-data that will be attached to the meta-data we already retrieve (i.e. every single harvester record may contain or point to additional citation records)

provide citation support in reading tools (context sensors that use citation data to provide additional information in RT sidebar)

Make sure that the components can be integrated/extended for metadata/section parsers/editors later

Citation DAO Library

We can use the usual PKP DAO pattern for all citation data persistence requirements

Citation GUI Pages

We might need an extra step in the submission process and an extra page in the editorial process for citation editing/lookup

Pages have to be application specific so we cannot usually share them between applications.

We'll however try to move as much as possible to the GUI components library for re-use, the page will only consist of a very high-level outline (GUI components library)

Apart from citation editing I don't think we'll invent new pages, it's more about integrating new components into existing pages.

Citation GUI Components Library

L8X editing capability is a lot more demanding than anything I know so far in OCS/OJS.

My bet is that 90% of the migration effort will go into the GUI migration (MJ, can you comment, please?). We have to port from scriptaculous to jQuery and from CakePHP's MVC-implementation to PKP's (including smarty). To achieve re-use between PKP applications and between pages we'll have to "componentize" the GUI more than it currently is in L8X.

AJAX Request Architecture

We should probably think of an AJAX specific high-performance MVC controller architecture. This means to implement shortcuts in the request processing for AJAX requests wherever possible (performance bottleneck!)

Both, the AJAX handler and the Page handler will be based on the same base classes but will extend them differently

Make sure, there is no AJAX security bypass of course (maintain the single point of entry + common security infrastructure for all types of request)

Installation/Infrastructure Requirements and Compatibility

No new initial installation requirements

Maintain PHP4 compatibility for initial installation

New installation requirements (additional software, PHP>4) only for optional plug-ins

select/configure parsing/lookup services: use the existing GUI elements from L8X (no AJAX required, if jQuery support is enabled then dependent sub-options will only appear when the main service is enabled)

Back-End Class Design

The Role of Plug-ins

What's our general approach to plug-ins?

Plug-ins use application specific hooks and therefore cannot be shared between PKP applications.

We'll offload as much functionality as possible to WAL and use plug-ins as thin wrappers around it where necessary.

We should only use plug-ins when they are really necessary (improve the user experience and/or code maintainability/testability)

When to do we need plug-ins?

non-standard installation requirements that need to be isolated

complex configuration or user-interface requirements that clutter the core interface for first-time users and should be kept out of the way

isolation of application-specific citation adapter code in one place to keep the core code clean -> improved code modularization and maintainability

Where to place citation plug-ins?

We'll have to create a new citation plugin category if we need additional hooks or have many plug-ins.

Otherwise we prefer to use existing categories.

Citation Service

A Citation represents a raw or parsed citation. The Citation can be in one of four states: raw, parsed, revised, confirmed (=looked-up). It implements the value object pattern. This class will be part of WAL.

A CitationDAO class will interface with the database to persist the Citation class. It implements the DAO pattern. This class will be part of WAL.

A CitationManager helper class will provide a simple interface (parse()/lookup()) to citation services. It implements the service façade pattern. This class will be part of WAL.

We'll implement several CitationParser and CitationLookup strategies. Citation parsers and lookup services isolate additional installation or configuration requirements. They implement the strategy pattern so that the CitationManager can use them transparently. These classes will be part of WAL.

Both, parsers and lookup services will be injected into the CitationMangager by way of core configuration or plug-ins (see the discussion of the role of plug-ins below). Plug-ins are part of the individual applications (OJS, OCS, OMP, Harvester).

Citation/Metadata Entities (OO analysis)

The overlap consist in all of these entities having a similar set of bibliographic meta-data (i.e. author, title, publication year, etc.)

Article, Paper and Monograph already inherit from a common base class (Submission). Unfortunately the semantic concept of a submission is quite different from that of bibliographic meta-data (=Citation).

Apart from that most bibliographical meta-data accessors (i.e. author, title, etc.) are not shared among the different submission types but rather re-implemented for all of them following a common nomenclature only (getPaperTitle() vs. getArticleTitle(), etc.).

It would be a major re-factoring to extract common meta-data from all cited entities and gather them in a common Metadata class that all named entities, including Citation, could use or inherit from.

All this makes it difficult to encapsulate Metadata in a shared class and re-use this class in the named entities, including Citation.

Luckily none of our real-world use cases (see Features above) forces us to share/convert data between existing entities and the Citation entity. So in fact implementing the Citation class apart from the other entities is more a theoretical than a practical problem. If we implement Citations separately from the other named entities we'll however considerably reduce the system's future flexibility and potential for re-use. We clearly breach accepted OO best practices.

Should necessity arise to share bibliographic meta-data between entities in the future, we would have to use the Proxy or Adapter patterns to do so. Conversion services/strategies could be implemented that extract/inject meta-data from/to all named entities. This is a little awkward but IMO the best option in practice. As we have no use case for this, it's all just thinking about the risks we assume in the worst case.

Implementation of the Citation/Metadata entities

We'll implement the Citation class with the above analysis in mind.

We imagine a MetadataProvider interface which we won't really implement (to maintain PHP4 compatibility) but enforce by convention:

A common abstract base class to "emulate" the interface is not an option in this case as we need to reserve inheritance to more central concerns of the classes (like Submission). So we just have to enforce the interface by convention.

This way we get a semantically correct but still rather flexible and intuitive link between Article, Monograph, Citation and Paper on one side and Schema/Record on the other.

Metadata schemes

One important question to answer will be which metadata schemes should be supported and how we represent them in the MetadataProvider interface. In other words: What is the minimal set of attributes/operations that the MetadataProvider should prescribe?

MJ, 2009-10-14: "In L8X, we use a sort of normalized mapping of the basic OpenURL 0.1 KEV [Key/Encoded-Value] format, but there is some crosswalk between the OpenURL 1.0 book/journal KEV formats as well as a little bit of DC [Dublin Core]. OJS and OCS also have their own variations on DC for article/paper metadata (can't say about OMP) - so we need to think about how this will be best represented in the metadata model."

My proposition is trying to implement the superset of all standards that we want to support.

Next Steps

implement citation service back-end

specify AJAX request architecture

specify GUI fragments/AJAX components

get specification approval from Alec, Brian, ...

Further Ideas

Implement citation output plug-ins for Chicago Manual of Style, American Medical Association, American Sociological Association and Council of Science Editors (see mails to pkp-support from Mark and John, 20/10/2009)

Don't kill L8X as a standalone application, integrate it with PKP WAL

Package/brand/SEOize parser/lookup library separately for re-use in other document based OSS applications (ECM)

Let users configure "content types" (document types) to improve parsing and reduce manual work for batches of similar documents