seeo: how to deal with vague and incomplete information

Transcription

1 SEED - A DBMS FOR SOFTWARE ENGINEERING APPLICATIONS BASED ON THE EN TITY-RELATIONSHIP APPROACH Martin Gllnz Jochen Ludewig $) Brown Boveri Research Center CH-S40S Baden, Switzerland ABSTRACT SEED is a database system which supports the data engineering needs of a software engineering environment. It provides information structures that are not incorporated in conventional database systems, but are typical in the software engineering process. This paper describes two principal features of seeo: how to deal with vague and incomplete information without giving up consistency checking, and the management of database versions and variants. A prototype of SEeD Is used as the database for an existing specification and design tool. INTRODUCT ION A software engineering environment uses information structures that are rather different from those provided conventional database systems. Building a DBMS for software engineering applications therefore requires the development of new, engineering oriented database concepts. Existing work on semantic database modelling and on engineering databases (for an overview, see [12], [61 and the references cited there) provides solutions to many points in the problem space. However, we found no work addressing the full scope of database requirements when developing tools for software specification and design. On the other hand, software engineering is a field large enough to justify tailored solutions. On this background, we designed SEED (which stands for Software Engineering Environment Database System>. A prototype of SEED was implemented in a straightforward manner, deriving the implementation concepts from the model. Our ultimate goal was not to invent a new database model, but to provide a DBMS that substantially eases the task of data engineering when building a software engineering environment consisting of a set of cooperating tools. *)with ETH Zurich since January, 1986 CONCEPTS Concepts for software design The concept of SEED was strongly influenced our work on the specification system SPADES (9) and its predecessor, ESPRESO. We therefore briefly outline our approach to software design. We consider specification and design to be evolutionary, strongly intertwined processes. Their goal is to model the target software system. Such a model isasemiformal description of the and relationships that the target system is of. Development starts with Informal, incomplete, and vague textual descriptions and evolves to a rather formal representation objects and relationships of well defined sorts. Information is accepted independently of its forma1ity and completeness. But at any stage, the collected information must be consistent (according to the semantics of a specii'fca-" tion grammar). Eventually, the result must be sufficiently formal, complete, and precise to serve as a basis for implementation. The state of the development is saved after every larger modification. Rollback to prior states or tracing alternatives allow for exploring the design space and for undoing errors. Basic ideas of SEED SEED is based on the entity-relationship approach (2]. This approach is especially suited for software development with semiformal models. However, the entity-relationship model lacks some features that are vitally needed in a DBMS for software engineering applications: object hierarchies, a sophisticated consistency concept, how to deal with vague or incomplete information, and management of versions and variants. In SEED, these features are added to the entityrelationship model (7). Figures 1 and 2 give a general idea of the basic concepts; Figure 1 shows some objects and relationships that are handled SEED using the schema of figure 2. This schema describes the data model of a primitive specification system where actions, data, and data flow may be CH2261-6/86/0000/ 0654S IEEE 654

2 represented. diagrams (or We use modified entity relationship graphic representation. Notice the difference between figures 1 and 2: Figure 1 gives an example of data that be stored in SEED. Figure 2 showsa )~~:!!!!!! that defines what ki nds of data may :; In this paper, we focus on two extensions that we consider most important: (1) the problem of ad mitting vague and incomplete data without loosing consistency control and (2) management of versions and variants. Exr,anation of figure 1: (1 Is an Independent object with name 'Alarms'. (2) is a relationship '', relating objects 'AlarmHandler' and 'Alarms' In roles '' and 'from', respectively. All objects below of 'Alarms' are dependent objects (sub-objects). The name of a dependent object is composed of the name of its parent and of its role in the context of the parent object. T hus, (3) is the object 'Alarms. Text' consisting of objects 'Alarms. Text. Body' and 'Alarms. Text.Selector'. The latter has the value "Representation". Finally, (4) Is a dependent object with name 'Alarms.Text.Body. Keywords(l]' and with value "Display". (2) from,, --~ "Alarms are represented in an alarm display matrix" (1 ) (3) Selector "Representation" Keywords[OJ "Alarmhandling" Keywordsl1 ) "Display" (4) Explanation of figure 2: 'Data' is a hierarchically structured object class with class 'Data. Text' as a subclass, which again has the subclasses 'Data. Text. Body' and 'Data. Text.Selector'. The latter has objects of type STRING as instances. 'Data. Text' has the cardinailly , specifying that any object of class 'Data' may have from zero up to 16 objects of class 'Data. Text'. Classes 'Data' and 'Action' are related relationship classes) '' and 'Write' I i 1.. * and 0.. *. '1.. *' means that 'Data' must have at relationship with an instance of 'Action'; there is no upper bound for the number o( such relationships. The roles 'from' and '' of the ''-association ex press that reading is from instances of 'Data' 2:i. instances of 'Action'. The association 'Contained' imposes a tree structure on the objects that are instances of 'Action' means of the attribute ACYCLIC and the cardinality for the role 'in'. Figure 1: Sample object relationship structure Text Data,.. * 1.. * O 16 from 0. * Action in Contained 0.. * l70-~cc-ccrj]o".~. '~,,","~~'" Description ACYCLIC STRING Class Contents STRING n. m Sub-Class (dependent class) Association Cardinality (min to max, * = unlimited) Figure 2 : A sample SEED schema os,

3 MANAGING VAGUE AND INCOMPLETE INFORMATION The normal approach to database consistency is to require all data In the database to fully comply with the structures and constraints given in the schema. However, this approach prevents the entry of incomplete and vague information into the database. We use the schema of fig. 2 for two examples: (1) We cannot store the information that there is a dataflow from 'AlarmHandler' to 'Alarms' unless we precisely know whether it is a read or a write, because there is no schema category which fits the vague information about the existence of a dataflow. (2) We cannot enter 'Alarms' as an object of class 'Data' without also entering a ''- and a 'Write'-relationship of 'Alarms' with objects of class 'Action', because the database would become inconsistent otherwise. This is due to the fact that the minimum cardinalities of the '' and 'Write' associations require every object of class 'Datal to have at least one ''- and one 'Write'-relationshlp with objects of class 'Action'. Management of vague and Incomplete data therefore requires extended schema strudures as well as a modified consistency concept. Vague data Generalization is a well known principle for representing meta-classifications ('is-a'-relationships) (11). This principle can be used to define categories In the schema that allow for dealing with vague data in a well defined manner. We extend generalization from object ctasses also to associations (relationship classes). Wherever we want to allow for vague information, we define a hierarchy of generalizations : General Ized classes and associations provide categories to enter vague data. When the knowledge about these data becomes more precise, they are moved down in the generalization hierarchy to one of the specializations. Figure 3 s hows an example: The schema of fig. 2 is modified such that associations 'Write' and '' are generalized to 'Access'. Class 'Data' is specialized (inverse of generalization) to 'OutputData' and 'InputData'. Classes 'Data' and 'Action' are generalized to 'Thing'. This allows storage of vague information like "There is a thing with name 'Alarms"'. When we know more about 'Alarms', e.g. that it Is a data object which is accessed action 'Sensor', we may make the previously stored information more precise re-classlfylng 'Alarms' in class 'Data' and introducing an 'Access'-relationshlp with 'Sensor'. In a next step, we might learn that 'Alarms' is an output. Again, we can make the stored information more precise specializing the 'Access'-relatlonship to a 'Write'-reJationship. Finally, we could arrive at a precise information like "'Alarms' Is an output written twice 'Sensor', and writing is repeated in case of error". In generalization hierarchies of associations, different cardinalities may be used to express additional semantics. For example, the cardinality 1.. * of 'Access ' means that every object of class 'Action' eventually must access at least one object of class 'Data'. However, the cardinality 0.. * of ' ' and 'Write ' allows either a write or a read access to satisfy this condition. NumberOrwrites 1..1 OutputData to,.. * ErrorHandling O. 1 (abort, repeat) InpulOata '- -'1..* Data O. 16 Text o *r>- ~ o"'""=-l-,,". ~ ~c:o;n~t,ained on.t r_b:'y'-:--:l Action '.. ~. ~~~"~"F~~'~~ Access ACYCLIC Revised generalization Thing DATE Figure 3: SEED schema with generalizations of classes and associations.56

4 Incomplete data We already mentioned that minimum cardinalities restrict the treatment of incomplete information. Howe... er, we do not want to omit minimum cardinalities as they pro... ide information about the desired final state of data that is being stored in the database. The problem is sol... ed partitioning the information that is pro... ided the schema into two categories: consistency and completeness information. Class and association membership, maximum cardinalities, ACYCLIC-conditions, and attached procedures are consistency information. Minimum cardinalities and co... ering conditions for generalizations represent completeness information. (Attached procedures may be attached to any SE ED schema element. They are executed when an item of the corresponding schema element Is updated. Attached procedures are used to express complex integrity constraints. A generalization is co... erlng if e... ery data item must finally be specialized in a specialized class (or association) of this generalization. ) M;mipulating vague and incomplete d~t;) Manipulation of vague data requires an operation for re-classlfying an existing data item within a generalization hierarchy. As we allow for incomplete data, we may ha... e objects with u ndefined sub-objects and not yet existing relationships. The semantics of such objects in database operations Is simple: When the database is searched for data that meet certain selection criteria, an undefined object matches nothing. Taking joins or cartesian products Is not affected undefined items. This is due to the fact that entityrelationship based models define these operations on existing relationships only. Whene... er an update operation is executed, SEED checks all consistency rules, that are derivable from the consistency information mentioned abo... e, and that apply to the data being updated. Thus SEED permanently ensures database consistency. Formal detection of incompleteness is pro... ided operations which check the rules that are deri... able from the completeness conditions in the schema. VERSIONS AND VARIANTS Versions T he SEED version concept allows certain states of the database to b. preserved. It aims ot long term preser... ation, e.g. when a document has been finished or a product is released, as well as at short term logging, e.g. saving the database state before and after a session. However, SEED does not keep a log of e... ery database update. Versions are created explicitly taking a snapshot of the database. Additionally, there is always a current version representing the current state of the database. Every update changes this state, replacing the current version with a new one. When a current... ersion is to be saved, an explicit version generation mu st be performed prior to the update. Versions are identified a decimal classification. The classification tree reflects the version history. Versions cannot be modified, except for deletion. However, alternatives may be c reated selecting a historical version to become the current... erslon prior to the execution of a sequence of update operations. Work then continues on the basis of this version until it is sa... ed with a... ersion creation command and the original current... ersion is selected again. Retrieval of data from an old version is performed in the same way as retrieval from the current version. The version of interest is selected prior to the execution of retrie... al operations (with the current version as a default). SEED defines additional operations for h is tory retrie... al and na... igation, e.g. 'find all... ersions of object 'AlarmHandler', beginning with... erslon 2.0'.,.0,.0 c Current C c AlarmHandler Re... ised Description r-_/c 2.0 "Generates alarms from process data, triggers Operator Alert" 'Handles alarms derived from rocessdat " C Z.O "Handles alarms" 1.0 Figure 401: Sample objects and relationships with versions 651

5 When creating a version we do not save the complete database. We only store those objects and relationships that have been changed after the creation of the previous version. Items that have been deleted in this interval must also be recorded. This is made easy marking items as deleted instead of removing them physically. Fig. 4 shows an example of objects with multiple versions. The stored versions of an object are represented as a cluster of ovals. The version of a hierarchically structured object is composed of the versions of its sub-objects. In this example, we have information about three versions: 1.0, 2.0, and Current. From this, we can build views to particular versions. The view to a version with number n consists of the objects and relationships having the greatest version number that is less than or equal to n (provided that they are not marked as deleted). Figures 4b and 4c show the corresponding current version and version 1.0, respectively. When the schema is modified, the interpretation of versions that were created before this modification becomes a problem. Therefore, we must generate schema versions, too. Patterns and Variants Figure 4b: AlarmHandler Revised Description "Generates alarms from process data, triggers Operator Alert U Current version of data items of figure 4.1 AlarmHandler When entering information Into user often wishes to express data that are not reflected For example, the schema may define a class of procedures that are to be specified. A subclass of this class may contain the deadline for the completion of every procedure specification. If a user wishes to express that some procedures have a common deadline and wants to maintain that deadline value consistently for these objects, he/she cannot do so. in SEED, a pattern concept is provided for dealing with those situations: Any data item that is entered into the database can be marked as a pattern. Patterns are invisible to any retrieval operation and are not checked for consistency unless they are Inherited a 'normal' data Item. The semantics of patterns and inheritance is as follows: all retrieval operations view patterns as If they were inserted In the context of the inheritors. However, instead of a real insertion we establish a special inherits-relationship between a pattern and any of its inheritors. Thus pattern information cannot be updated in the j:ontext of the inheritors,. but only in the pattern itself. Conversely, any update of a pattern automatically propagates to all inheritors of that pattern. Returning to the example introduced above, the user may define a ' pattern procedure object with a given deadline. Every ' real procedure object that should share this deadline, inherits the pattern. The deadline value will be maintained consistently, as it is not changeable in the real objects, whereas a change in the pattern affects all Inheriting objects In the same way. There are several other applications for patterns, e.g. for templates, user defined constraints, or standardized data environments. Revised Description "Handles alarms" Figure 4c: Version 1.0 of data items of figure 4a Patterns also serve as a basis for managing variants: We define a I to be some sets of objects and have a part of their information in common, differ in some other parts. This means that every variant shares a part of its objects and relationships with the other members of the family (the so called common part), but has also objects and relationships that differ from the other members (the variant part). 6S8

6 Variants are different from alternatives: alternatives are coexisting versions of the database, whereas variants express that some information in the database consists of a common part and some varying parts. An example of variants is a set of system configurations that share most of the software modules, but differ in some hardware dependent modules. Common and... ariant parts of a variants family are described normal items. The connections between the common part and the several variant parts are established pattern relationships with every variant inheriting these patterns. Pattern semantics now guarantee that all variant parts have the same relationships to the common part. This could not be assured with ordinary relationships. In fig. 5, the common part is connected to pattern objects POl and POZ pattern relationships PRl and PRZ, respectively. Both variants inherit these patterns. Thus, they both have inherited relationships to the common part, i.e. they have it in common. RELATED WORK We should like to acknowledge that SEED incorporates many ideas from work on engineering databases, semantic data models, and extensions of the entity-relationship model. Smith and Smith (12] deal with a database approach to specification, focussing on formal specifications. Bever and Lockemann [1] also propose an entityrelationship database for a software engineering environment. They concentrate on the coding and compilation phase, where information is fully formalized. Katz and Lehman (8] and Tichy (131 deal with version and configuration management on the level of files. A semiformal approach to software development with emphasis on the specification and design phase, which SEED aims at, is not covered this work. The version concept of SEED works on the database, not on files. The numerous extensions of the entity-relationship model ([3), [41, [5], [10J) point out many solutions to particular problems and have been a valuable source for the design of SEED. However, they reveal no concise solution to the problems of software engineering data management, which is the main goal of SEED.,common part,, ' PR2 Jj;j'S/ ~.> _~O , I 'I, A... /.".:><.', v,,;ant{ I 'I \ I,,, \ / \ I,variant part A./ ", variant part ~ _ _ _- - --~ --"... _.. Inherits-relationship Figure S: Defining variants means of patterns DISCUSSION Open problems SEED Is currently a single user system only. The problem of concurrency control and version management In a multi-user environment have not yet been solved. We only have some rough ideas concerning a two level approach: One central server runs the complete database and several clients use the server for retrieval operations, but take local copies for making updates. Data that has been copied to a client for update has a write lock in the central database. When a client sends an updated copy back to the server, the server puts the modified data Into the central database In a Single transaction. Versions are kept both locally and globally under control of the user and the server, respectively. In our version concept, we have not yet considered history sensitive consistency rules, i.e. rules that impose constraints for the transition from a given version to Its successor. DATA MANIPULATION IN SEED SEED has been designed to support the data management tasks of software development tools. Hence, SEED has an operational interface that consists of a set of procedures. The SEED prototype provides the procedures for data creation, update, and simple retrieval name. Retrieval with complex queries is not supported. State of work A prototype of SEED is operational. It is currently being Integrated into the specif!cation system SPADES. Implementation concepts for versions and variants have been developed, but the implementation is not yet done. The practical use of SEED will give us Insight in Its benefits and weaknesses. The experience gained 650

COMP 378 Database Systems Notes for Chapter 1 of Database System Concepts Introduction A database management system (DBMS) is a collection of data and an integrated set of programs that access that data.

Chapter 2 Data Model Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel 1 In this chapter, you will learn: Why data models are important About the basic data-modeling

FAQs Introduction to Database Systems and Design Module 1: Introduction Data, Database, DBMS, DBA Q2. What is a catalogue? Explain the use of it in DBMS. Q3. Differentiate File System approach and Database

Database Systems I (Compulsory) INTRODUCTION This is one of the 4 modules designed for Semester 2 of Bachelor of Information Technology Degree program. CREDITS: 04 LEARNING OUTCOMES On completion of this

An interface for novice and infrequent database management system users by JAMES A. LARSON and JENNIFER B. WALLICK Honeywell, Inc. Bloomington, Minnesota ABSTRACT Special interfaces are needed for novice

Chapter 2 THEORETICAL FRAMEWORK 2.1 Introduction Information System is processing of information received and transmitted to produce an efficient and effective process. One of the most typical information

ABSTRACT OODB Design with EER Byung S. Lee Graduate Programs in Software University of St. Thomas In contrast to the conventional methodology of object-oriented program design focused on the interaction

1 B.Sc (Computer Science) Database Management Systems UNIT-V Business Intelligence? Business intelligence is a term used to describe a comprehensive cohesive and integrated set of tools and process used

DBMS Interview Questions 1. What is database? A database is a collection of information that is organized. So that it can easily be accessed, managed, and updated. 2. What is DBMS? DBMS stands for Database

A Multidatabase System as 4-Tiered Client-Server Distributed Heterogeneous Database System Mohammad Ghulam Ali Academic Post Graduate Studies and Research Indian Institute of Technology, Kharagpur Kharagpur,

Advantages of XML as a data model for a CRIS Patrick Lay, Stefan Bärisch GESIS-IZ, Bonn, Germany Summary In this paper, we present advantages of using a hierarchical, XML 1 -based data model as the basis

Chapter 2 Database and Expert System Technology 2.1 Hierarchical Model The hierarchical data model is a logical schema and can be viewed as a sub of a network model because it imposes a further restriction

The Role of Computers in Synchronous Collaborative Design Wassim M. Jabi, The University of Michigan Theodore W. Hall, Chinese University of Hong Kong Abstract In this paper we discuss the role of computers

Automatic software measurement data collection for students 1. Automatic software measurement within a software engineering class Software is invisible and complex, so it is difficult to understand the

CC414- Lec DataBase Models by Prof. Dr. Amani Saad 1 In this lecture, you will learn: Why data models are important About the basic data-modeling building blocks What business rules are and how they affect

What is a life cycle model? Framework under which a software product is going to be developed. Defines the phases that the product under development will go through. Identifies activities involved in each

14 Databases 14.1 Source: Foundations of Computer Science Cengage Learning Objectives After studying this chapter, the student should be able to: Define a database and a database management system (DBMS)

Fundamentals of Database Systems, 4 th Edition By Ramez Elmasri and Shamkant Navathe Table of Contents A. Short Table of Contents (This Includes part and chapter titles only) PART 1: INTRODUCTION AND CONCEPTUAL

Nenad Bojčetić Dragan Žeželj Mario Štorga ISSN 1333-1124 A TOOL FOR SUPPORTING THE PROCESS OF PROPERTY MANAGEMENT AND THE CREATION OF TECHNICAL DRAWINGS Summary UDK 744:004.4 CAD applications play a major

CHAPTER 2 DATABASE MANAGEMENT SYSTEM AND SECURITY 2.1 Introduction In this chapter, I am going to introduce Database Management Systems (DBMS) and the Structured Query Language (SQL), its syntax and usage.

Agile Business Suite: a 4GL environment for.net developers DEVELOPMENT, MAINTENANCE AND DEPLOYMENT OF LARGE, COMPLEX BACK-OFFICE APPLICATIONS In order to ease the burden of application lifecycle management,

International Journal of Management and Sustainability Special Issue: Economic, Finance and Management outlooks journal homepage: http://pakinsight.com/?ic=journal&journal=11 AN ARCHITECTURE FOR PERSONAL

The EMSX Platform A Modular, Scalable, Efficient, Adaptable Platform to Manage Multi-technology Networks A White Paper November 2002 Abstract: The EMSX Platform is a set of components that together provide

Journal of Information Technology Management ISSN #1042-1319 A Publication of the Association of Management SIGNS OF IT SOLUTIONS FAILURE: REASONS AND A PROPOSED SOLUTION MAJED ABUSAFIYA NEW MEXICO TECH

UNIT-1 Ques 1. Define dbms and file management system? Ans- Database management system (DBMS) is a collection of interrelated data and a set of programs to access those data. Some of the very well known

Introduction to Database Systems A database is a collection of related data. It is a collection of information that exists over a long period of time, often many years. The common use of the term database

The Role of Modelling in Teaching Formal Methods for Software Engineering A. J. Cowling Department of Computer Science University of Sheffield Sheffield, England A.Cowling@dcs.shef.ac.uk Abstract. This

The Entity-Relationship Model 221 After completing this chapter, you should be able to explain the three phases of database design, Why are multiple phases useful? evaluate the significance of the Entity-Relationship

Distribution: - What are the implications of the model for operation in a distributed environment? 6. Conclusion We have proposed a model that provides flexible and integrated support for version management

Core Syllabus B BUILD KNOWLEDGE AREA: DEVELOPMENT AND IMPLEMENTATION OF INFORMATION SYSTEMS Version 2.6 June 2006 EUCIP CORE Version 2.6 Syllabus. The following is the Syllabus for EUCIP CORE Version 2.6,

UNIVERSITY OF EAST ANGLIA School of Computing Sciences Main Series UG Examination 2012-13 DATABASE SYSTEMS CMPC2B08 Time allowed: 3 hours Answer THREE questions. All questions carry equal weight. Notes