Revision as of 13:13, 14 May 2018

Dryad provides a statistical display that takes into consideration Dryad's data package / data file distinction. For Dryad, there is the concept of a data package that contains all the data files (and other related files) associated with a particular publication. Each Dryad data package may have a relationship to multiple data files. Dryad wants to display statistical information about both levels of information: the data package and the data files.

Contents

Functionality

We also display a "viewed" and "downloaded" count for each package and file (on both the Dryad Data Package and Dryad Data File item view pages).

Workflow

DSpace stores its usage statistics in a Solr index called "statistics" (it has a Web-based administrative interface through which test queries can be run). Records stored in this index have an `owningColl` (owning collection) and `id`. Records for bitstreams will have an `owningItem` (owning item). Dryad doesn't attach bitstreams directly to data packages, so when looking for download statistics, search by owningItem using the internal ID of the associated data file object.

Relationships between data file and data packages are not stored in the statistics index (since this is a concept overlaid onto DSpace by Dryad). These relationships though can be gleaned from the item records in the "search" Solr index. The "search" index is where general searching takes place and the relationships between records are stored there as linked identifiers. The "search" index also has a Web-based administrative interface through which test searches can be performed.

Since Dryad stores data packages and data files in separate collections, overall statistics (like the type displayed on the Dryad home page) can be gathered by checking the total number of active records in each collection. The additional information of how many unique journals have publications represented in Dryad is determined by querying the unique names of journals in the prism.publicationName field of the "search" Solr index.

All these statistical functions are performed in Java code which then writes a value into the `pageMeta` (page metadata). The Dryad XSLT theme then takes these values and displays them on the appropriate page when that page is rendered into HTML.

The Java classes related to the gathering of these Dryad specific statistics are kept in the Dryad overlay of the xmlui module. The classes have been put into the org.datadryad.dspace.statistics package to indicate they are specific to Dryad and not just a slight modification of an existing DSpace function. Here are the classes used and a brief explanation of each:

org.datadryad.dspace.statistics.SiteOverview

Generates the overall statistical summary displayed on home page

org.datadryad.dspace.statistics.ItemStatsOverview

Pulls together the stats generated from the !ItemPkgStats and !ItemFileStats and puts them into the page metadata

Caches the statistical generation so it doesn't need to be run at each page visit

The XSLT code that pulls the individual data package and data file statistics from the page metadata for display on the page can be found in the DryadItemSummary.xsl file. The XSLT code that displays the site's overall statistical information can be found in the default Dryad.xsl file.

Configuration

There is a minimal amount of configuration needed in the dspace.cfg file for the Dryad statistics to work. They, of course, rely on the locations of the Solr server to be set via the `solr.log.server` and `solr.search.server` variables. The Dryad statistics code also requires two additional variables be set `stats.datafiles.coll` and `stats.datapkgs.coll` to indicate which collections contain the files and data packages.

Dryad sets the Solr server variables in the dspace.cfg file rather than the dspace-solr-search.cfg file so that we can use the standard Maven profiles mechanism to override these variables, depending on which Dryad instance we're running (demo, dev, production, etc.)

Statistics on the Statistics

All statistics are stored in SOLR.

An easy way to process stats (outside of DSpace) is to construct a SOLR query, curl it into a file, and then process the file to get the needed information.

File/Package views can be found by searching for field "id" with the item's internal DSpace item_id. File downloads can be found by searching for field "owningItem" with the item's internal DSpace item_id.

Sample requests

The statistics are based on Dryad's internal item_id for each item, not the DOI.

Relation to DSpace

Dryad's statistical display relies on the statistics log index created by the standard DSpace Statistics module. It also relies on the "search" index created by the Discovery module. If Solr index fields change in the Discovery module, this may result in the custom Dryad statistical functionality to break (for instance if the publication names are put into a different index or if the relationships between files and packages are stored differently).