Tools & Workbenches for Data Management

GFBio supports researchers with software for data management. Two German developer groups (Diversity Workbench (DWB) and BEXIS 2 ) providing open source platforms for biodiversity data management extend their applications in the GFBio context. Using one of these platforms during the active phase of research projects will facilitate data transfer for long-term archival and data publication via the GFBio portal.

Learn more about the scope of application, specific features and functionalities of DWB and BEXIS 2!

DWB in 10 Questions

Diversity Workbench (DWB)is a suite of relational SQL databases and tools to process bio- and geodiversity data. DWB tools address data generation, data management, quality assurance and basic data analysis issues in the following scientific domains: molecular and evolutionary biology, systematics, biogeography, ecological and environmental sciences, geosciences. One of the databases has a generic data model. The desktop tools (client-server database applications, data processing and GIS tools) and the mobile app are independent of each other, but might be joint to build a linked data network. DWB offers central cloud services with terminologies, taxonomies, regional taxon checklists, gazetteers and GIS information. The software has been user-tested for more than 20 years, see DWB workshops.

DWB use is free (GPL 2 license) and without costs (as far as MS SQL Server Express is used). The software is open source apart from the underlying MS operating system and MS SQL Server.

DWB has a broad spectrum of users. It is primarily designed as solution for individual scientists and small research groups without availability of a database manager, but basic to intermediate skills in data management. Addressed users include those who are most familiar with spreadsheets and tabular data. They are guided by DWB import wizards to switch to manage their data in DWB databases with user-friendly graphical interfaces. For performing particular data analyses users might adapt DWB data export wizards (e.g. for R, FASTA) and use external software solutions which are standard in their research community.

Because of its scalability and the tools for organizing data publication DWB is more and more used as a system for large data repositories, and is installed at various data centers.

It is comfortable to install Windows authentication services for single-sign-on access to log-on Windows operating system and DWB services with the same user name and password.

Individual researchers and collaborative research groups install DWB tools primarily for managing raw data. Thereby the tools address the early steps of the data life cycle: "Collect", "Assure" and "Analyze".

With network installations at large data centers, as realized at the Staatliche Naturwissenschaftliche Sammlungen Bayerns (SNSB), DWB tools like DiversityProjects and the new DWB filtering and transformation tools guide data archiving and data publication with steps like "Describe", "Submit", "Preserve" and "Publish".

DWB is appropriate for a broad spectrum of users who aim to manage diversity datasets already at the beginning of the data life cycle. Depending on their interest the researchers can install and configure one or more DWB databases and editing tools. DWB editing interfaces with several spreadsheet modes allow high flexibility in organizing diverse data entities/ datasets/ assets of various size and complex structure. Advanced wizards and graphical user interfaces facilitate the import of legacy data, data transformation (also from data matrices) and data exchange. We offer tools for GIS editing, geographic and geometric object storage and image processing. Advanced search options allow for retrieving free combinations of parameters on more than 800 data fields. Each user has open and free access to DWB cloud services (see above) and to several external machine-readable web services (like those of the Catalogue of Life) integrated in the DWB clients.

Guided by graphical user interfaces scientific data managers could organize DWB user accounts and user rights with granular role-based permission for data access and data processing. Data managers are empowered to administer networks of related datasets, projects and subprojects. DWB has mechanisms on board to allow technical and content quality control. The granular management of version history for each single data entity is possible. Advanced functions to organize embargos and withholds for various types of data elements and data objects at several levels are included. There are process pipelines for data transformation, data replication and data publication. DWB has functions for the management of documents, legal issues like property rights and licenses as well as for organizing metadata according to international standards. With a DWB network installation the data managers might strictly distinguish between an in-house master database environment and derived cache databases with selected information for data publication via online portals.

The exact technical prerequisites depend on the target (network) solution. In all cases, DWB tools run on a (virtualized) MS Windows operating system. For details see Technical documentation at a glance.

The installation might be done in a simple way without distinguishing between administrator role and user role. Alternatively, database administrators and scientific data managers might use a large portfolio of functions designed for them (see above).

The DWB database suite installed by individual researchers and research groups is guiding data producers to generate, manage and quality control well structured data and metadata following the FAIR data principles.

As soon as required, the results are most easily be published via GFBio tools for data submission and publication. Three of the GFBio data centers, i.e. SMNS, SNSB and ZFMK, have DWB networks with GFBio ingest tools, GFBio filtering and transformation tools for data publication and DWB management systems with archiving solutions installed; see GFBio Data Centers – Technical documentations. They are ready to guide data producers in this context. DWB wizards and flexible services for data exchange also support data transfer to the other GFBio data centers without DWB installations.

BEXIS 2 in 10 Questions

BEXIS 2 is a data management software supporting researchers in documenting, finding, sharing, and publishing data during the active phase of a research project. BEXIS 2 is a modular, scalable, interoperable, free and open source system supporting large research consortia on all aspects of data life cycle management. The software is being developed based on requirements of the biodiversity and ecology domain that mostly deal with tabular data, but is generic enough to serve other domains and data types as well.

BEXIS 2 has been designed for collaborative research projects with up to several hundred researchers. Most of these projects include a dedicated central data management team to administer and maintain the system. However, BEXIS 2 may also serve smaller teams and working groups. BEXIS 2 is used in fields like biodiversity, ecology, forestry, or atmospheric science.

Following the data life cycle researchers managing their data with BEXIS 2 are able to, for example, design a new dataset by specifying its variables (i.e. the data structure), enrich the data with metadata and supplementary files, add and update the dataset at any time, set fine grained permissions (view, download, update, delete), and ultimately publish the dataset in renowned data journals (e.g. Pensoft Biodiversity Data Journal) or data portals (e.g GFBio). Once a dataset is registered with BEXIS 2, the system takes care of different versions of the dataset, provides automatic backups, and ensures the data is findable and accessible. Users of BEXIS 2 are able to access datasets directly from other applications (e.g. R) through API calls (incl. selection, filtering, sorting). A key feature of BEXIS 2 is to re-use existing information whenever possible (e.g. terminologies, variables, data structures, metadata).

BEXIS 2 is a highly flexible and adaptable web-based system. For example, a data manager can incorporate different metadata schemas and provide mappings between them. A data manager may also setup different tenants providing individual projects with a custom look and feel of the application, customize the search interface, manage users and groups and their access to system features, monitor the system usage, or setup a single sign-on service (e.g. LDAP). BEXIS 2 has a modular system architecture allowing to add and replace individual modules while the system is running.

BEXIS 2 can be downloaded from http://bexis2.uni-jena.de/. Every BEXIS 2 version comes with a release note that explains the new features released in the version, an installation manual which includes step by step installation guide, and user guides explaining all the different modules of the application.

Within BEXIS 2 a user can initiate a submission to publish data through the GFBio Portal. A bundle including all relevant information and data is created and then submitted to GFBio for further processing and review.

To install BEXIS 2 you need a server with MS Windows operating system. BEXIS 2 also needs a database management system available on your server. You are free to choose among PostgreSQL, SQL Server, MySQL or IBM DB2 Express-C. There is a step by step installation guide to follow in the installation manual available in the software package.

BEXIS 2 is developed by a community driven open source project. There is an active development community on GitHub. This is the place to post bug reports and feature requests. There is also a developers’ mailing list: https://lserv.uni-jena.de/mailman/listinfo/bexis2-dev for technical discussions on concepts, software architecture, installation.

There is a community of active users and data managers of running instances who will help on data management questions, or provide hints on the usage and configuration of BEXIS 2. Feel free to join the public mailing list at: https://lserv.uni-jena.de/mailman/listinfo/bexis2-users

GFBio Consortium

The German Federation for Biological Data (GFBio), a sustainable, service oriented, national data infrastructure facilitating data sharing for biological and environmental research.