InfoSci®-Journals Annual Subscription Price for New Customers: As Low As US$ 4,950

This collection of over 175 e-journals offers unlimited access to highly-cited, forward-thinking content in full-text PDF and XML with no DRM. There are no platform or maintenance fees and a guarantee of no more than 5% increase annually.

Receive the complimentary e-books for the first, second, and third editions with the purchase of the Encyclopedia of Information Science and Technology, Fourth Edition e-book. Plus, take 20% off when purchasing directly through IGI Global's Online Bookstore.

Abstract

In the metadata generation context, metadata extraction is the first and most important stage in the production chain and has an enormous complexity due to the huge variety of storage formats for geospatial datasets. In addition, the authors analyze the current situation and importance of metadata in information systems and particularly in SDI. This chapter identifies and justifies the need to automate the metadata generation. In this context, the different metadata points of view according to their functions and interoperability levels are analyzed. Afterwards, different metadata generation methods and workflows, and various metadata generation related tools are reviewed, respectively. Finally, the authors introduce topics related to the automatic metadata generation that have neither been studied in depth nor prototypically implemented as future works.

Introduction

The concept of metadata is hardly new—the most common definition of the term metadata is “data about data,” with the first references to this term appearing in the context of geographic information, in ANZLIC (1996) and Kildow (1996). If we look for the origins of the term metadata, we will find its roots in the Greek word “μετα,” “beyond” and the word “data,” the plural of the Latin term datum-i, “piece of information” (RAE, 20111). Therefore, the meaning of the word may be explained as “beyond data.” However, according to Howe (1993), the term metadata did not appear in print until 1973, despite having been coined by Lack Myers in the 1960s in order to describe sets of data and products. In the literature related to this subject we find a good number of authors who provide the interpretation and scope of the practical and theoretical meaning of the term. Among these, we find Caplan (1995), Milstead and Feldman (1999), Ercegovac (1999), Sheldon (2001), and Steinacker et al. (2001), Swick (2002), and Duval et al. (2002), or Woodley et al. (2003). Summing up the contributions of all these authors, we may define the term eclectically as the structured set of data that describe other data and whose purpose is to improve our knowledge of the described information and help us answer such questions as ‘what,’ ‘who,’ ‘where,’ ‘when,’ ‘how much,’ and ‘how.’ They may also be described as those autonomous products that, linked to the data, allow us to keep an inventory of these, enabling its publication and reference value through the catalogues kept in SDI and, finally, allowing for the reutilization of data. The importance of metadata has been recognized by entities such as the EU’s INSPIRE2 Directive, and also by the endorsements of the GSDI3 initiative.

Moreover, Caplan (1995) acknowledges that the concept of metadata is used to avoid the prejudices developed by professionals in the field of information, who are closer than most to the world of libraries: computer technicians, software designers, and system engineers. Finally, metadata are used to describe the context, the quality, the condition or the characteristics of the data (Milstead & Feldman, 1999; Howe, 2003) in such a way that users can discover and understand their data sets, particularly in the context of Geographic Information (GI). For Zeigler et al. (2006), metadata is “a hierarchical concept in which metadata are a descriptive abstraction above the data it describes.”

Various experts are in favour of assigning the task of metadata creation to the owners of the geospatial datasets (geodata), in the belief that these owners are best suited to provide information about their data (Greenberg, 2004; Kolodney & Beard, 1996). In practice, metadata creation has occupied a secondary role within organizations, having been created after its production. For this reason, some organizations have considered the creation of metadata as an additional cost (Najar, 2006). This fact has been criticized by several studies; for example, in the CGIAR-CSI (2004) study we find the following statement: “The creation of metadata to novel data producers might seem burdensome, but the long term advantages are far superior to the disadvantages of the initial burden of implementing a Metadata policy within an organization. The initial expense of documenting data clearly outweighs the potential costs of duplicated or redundant data generation.”