Data Science Journal Latest Articleshttps://datascience.codata.org/articles/Latest articles published by Data Science Journalen-usTue, 14 Aug 2018 22:06:42 -0000Enhancing the Research Data Management of Computer-Based Educational Assessments in Switzerlandhttps://datascience.codata.org/article/10.5334/dsj-2018-018<p class="p1">Since 2006 the education authorities in Switzerland have been obliged by the Constitution to harmonize important benchmarks in the educational system throughout Switzerland. With the development of national educational objectives in four disciplines an important basis for the implementation of this constitutional mandate was created. In 2013 the Swiss National Core Skills Assessment Program (in German: ÜGK – Überprüfung der Grundkompetenzen) was initiated to investigate the skills of students, starting with three of four domains: mathematics, language of teaching and first foreign language in grades 2, 6 and 9. ÜGK uses a computer-based test and a sample size of 25.000 students per year.<span class="Apple-converted-space"> </span></p><p class="p3">A huge challenge for computer-based educational assessment is the research data management process. Data from several different systems and tools existing in different formats has to be merged to obtain data products researchers can utilize. The long term preservation has to be adapted as well. In this paper, we describe our current processes and data sources as well as our ideas for enhancing the data management.</p> Published on 2018-07-05 18:08:24https://datascience.codata.org/article/10.5334/dsj-2018-018Automatic Acquisition and Sustainable Use of Political-Ecological Datahttps://datascience.codata.org/article/10.5334/dsj-2018-017<p class="p1">The sustainable management of anthropogenically-impacted ecosystems will require ongoing monitoring and advocacy by people across the globe. To this end, automatic methods are developed herein for acquiring several types of such political-ecological data. On the political side, a method is developed for gathering news articles about human actions that affect the ecosystem along with a method for identifying themes in social media that concern the consumption of an ecosystem’s products. On the ecosystem side, a method is derived for estimating wildlife abundance from purchasable high-resolution satellite images. A simple website architecture is described for holding this data and enabling its use in developing sustainable conservation policies. A rhino conservation website illustrates this architecture. A fundamental contradiction between the desire for open data on the locations of endangered flora and fauna versus the need to hide these locations from poachers is addressed through a new security protocol that enables the secure distribution of sensitive ecosystem data to trusted data consumers.</p> Published on 2018-07-03 17:33:58https://datascience.codata.org/article/10.5334/dsj-2018-017Managing Digital Research Objects in an Expanding Science Ecosystem: 2017 Conference Summaryhttps://datascience.codata.org/article/10.5334/dsj-2018-016<p class="p1">Digital research objects are packets of information that scientists can use to organize and store their data. There are currently many different methods in use for optimizing digital objects for research purposes. These methods have been applied to many scientific disciplines but differ in architecture and approach. The goals of this joint digital research object (DRO) conference were to discuss the challenge of characterizing DROs at scale in volume and over time and possible organizing principles that might connect current DRO architectures. One of the primary challenges concerns convincing scientists that these tools and practices will actually make the research process easier and more fruitful. This conference included work from CENDI, the National Federal STI Managers Group, the National Federation of Advanced Information Services (NFAIS), the Research Data Alliance (RDA), and the National Academy of Science (NAS).</p> Published on 2018-06-29 13:38:14https://datascience.codata.org/article/10.5334/dsj-2018-016A Conceptual Enterprise Framework for Managing Scientific Data Stewardshiphttps://datascience.codata.org/article/10.5334/dsj-2018-015<p class="p1">Scientific data stewardship is an important part of long-term preservation and the use/reuse of digital research data. It is critical for ensuring trustworthiness of data, products, and services, which is important for decision-making. Recent U.S. federal government directives and scientific organization guidelines have levied specific requirements, increasing the need for a more formal approach to ensuring that stewardship activities support compliance verification and reporting. However, many science data centers lack an integrated, systematic, and holistic framework to support such efforts. The current business- and process-oriented stewardship frameworks are too costly and lengthy for most data centers to implement. They often do not explicitly address the federal stewardship requirements and/or the uniqueness of geospatial data. This work proposes a data-centric conceptual enterprise framework for managing stewardship activities, based on the philosophy behind the Plan-Do-Check-Act (PDCA) cycle, a proven industrial concept. This framework, which includes the application of maturity assessment models, allows for quantitative evaluation of how organizations manage their stewardship activities and supports informed decision-making for continual improvement towards full compliance with federal, agency, and user requirements.<span class="Apple-converted-space"> </span></p> Published on 2018-06-28 13:42:55https://datascience.codata.org/article/10.5334/dsj-2018-015Virtual Research Environment for Regional Climatic Processes Analysis: Ontological Approach to Spatial Data Systematizationhttps://datascience.codata.org/article/10.5334/dsj-2018-014<p class="p1">This paper describes a Virtual Research Environment (VRE) based on a web GIS platform ‘Climate+’, which provides an access to analytic instruments processing 19 collections of meteorological and climate data of several international organizations. This environment provides systematization of spatial data and related climate information and allows a user getting analysis results using geoinformation technologies. The ontology approach to this systematization is described, making it possible to match semantics of meteorological and climate parameters presented in different collections and used in solving various applied problems.</p> Published on 2018-06-27 13:29:20https://datascience.codata.org/article/10.5334/dsj-2018-014Data Tracking Analysis of the Geomagnetic Fixed-Station Network in Chinahttps://datascience.codata.org/article/10.5334/dsj-2018-013<p class="p1">Data tracking analysis is an important mechanism for increasing data analysis capacity and eliminating interference from observational data. In this study, the technique was applied to the geomagnetic fixed-station network to improve the efficiency and accuracy of analysis to extract useful information. This paper introduces the scope, workflow, analysis platform, abnormal variation status, and results of the geomagnetic data tracking analysis. We present some typical examples of abnormal variations in addition to our proposals for future work.</p> Published on 2018-06-25 14:41:29https://datascience.codata.org/article/10.5334/dsj-2018-013Text and Image Compression based on Data Mining Perspectivehttps://datascience.codata.org/article/10.5334/dsj-2018-012<p class="p1">Data Compression has been one of the enabling technologies for the on-going digital multimedia revolution for decades which resulted in renowned algorithms like Huffman Encoding, LZ77, Gzip, RLE and JPEG etc. Researchers have looked into the character/word based approaches to Text and Image Compression missing out the larger aspect of pattern mining from large databases. The central theme of our compression research focuses on the Compression perspective of Data Mining as suggested by Naren Ramakrishnan et al. wherein efficient versions of seminal algorithms of Text/Image compression are developed using various Frequent Pattern Mining(FPM)/Clustering techniques. This paper proposes a cluster of novel and hybrid efficient text and image compression algorithms employing efficient data structures like Hash and Graphs. We have retrieved optimal set of patterns through pruning which is efficient in terms of database scan/storage space by reducing the code table size. Moreover, a detailed analysis of time and space complexity is performed for some of our approaches and various text structures are proposed. Simulation results over various spare/dense benchmark text corpora indicate 18% to 751% improvement in compression ratio over other state of the art techniques. In Image compression, our results showed up to 45% improvement in compression ratio and up to 40% in image quality efficiency.</p> Published on 2018-06-07 12:12:05https://datascience.codata.org/article/10.5334/dsj-2018-012Marine Data Services at National Oceanographic Data Centre-Indiahttps://datascience.codata.org/article/10.5334/dsj-2018-011<p class="p1">In this paper we introduce about the marine data archived at Indian National Centre for Ocean Information Services (INCOIS), Ministry of Earth Sciences, India. Heterogeneous data from in situ, remote sensing and ocean models are archived. In-situ ocean observations includes data from Lagrangian as well Eulerian platforms like Argo floats, moored buoys etc, while remote sensing include data from NOAA satellite series, OceanScat etc. The data generated is translated into ocean information services through analysis and modelling. Data is disseminated to users using variety of means like web with GIS features, ERDDAP, Live Access server with facilities to search, visualize and download.</p> Published on 2018-05-10 12:44:43https://datascience.codata.org/article/10.5334/dsj-2018-011Ontology Usability Scale: Context-aware Metrics for the Effectiveness, Efficiency and Satisfaction of Ontology Useshttps://datascience.codata.org/article/10.5334/dsj-2018-010<p class="p1">Both ontology builders and users need a way to evaluate ontologies in terms of usability, but existing ontology evaluation approaches do not fit this purpose. We propose the Ontology Usability Scale (OUS), a ten-item Likert scale derived from statements prepared according to a semiotic framework and an online poll in the Semantic Web community to provide a practical way of ontology usability evaluation. Case studies were conducted to bookkeep current usability evaluation results for ontologies expecting revisions in the future, and discussions of the poll results are presented to help proper use and customization of the OUS.</p> Published on 2018-05-10 12:38:00https://datascience.codata.org/article/10.5334/dsj-2018-010Unpacking the ‘Black Box’ of Public Expenditure Data in Africa: Quantification of Agricultural Spending Using Mozambique’s Budget Reportshttps://datascience.codata.org/article/10.5334/dsj-2018-009<p class="p1">This paper undertakes a detailed examination of the availability and quality of data on public expenditures in agriculture in Africa. We consider the case of Mozambique, a country characterised by low income and low administrative capacity, but also by a policy environment that has turned a focused lens on public funding to agriculture. We explore the extent to which domestic analysts may be able to access and use such data to reliably quantify public resource allocation to the sector, and to unpack the ‘black box’ of what goes into country-level public expenditure statistics. We find that data are, surprisingly, freely available in great abundance. This has encouraging aspects but also pitfalls: On the one hand, data that are often out of public sight are openly accessible for Mozambican researchers to draw upon. But the drawback of high abundance emanates from its manifestation in the form of a proliferation of multiple classification systems used to create a fine disaggregation of public funds data; given Mozambique’s limited public sector capacity, this has meant that each classification system leaves a lot to be desired, making it hard to use any single one to accurately and fully reliably reconstruct the amount of public resources going to agriculture. Making the hard choice to eliminate some of the classification systems, and dedicate this freed-up capacity to be more thorough on the retained ones, would better serve domestic users of such data, as well as the government, which is both a consumer and producer of these data.</p> Published on 2018-04-10 12:04:03https://datascience.codata.org/article/10.5334/dsj-2018-009