Saturday, January 9, 2010

JDBC is an API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases.

JDBC was first introduced in the Java 2 Platform, Standard Edition, version 1.1 (J2SE), together with a reference implementation JDBC-to-ODBC bridge, enabling connections to any ODBC-accessible data source in the JVM host environment.

JDBC has been part of the Java Standard Edition since the release of JDK 1.1. The JDBC classes are contained in the Java package java.sql.

Starting with version 3.0, JDBC has been developed under the Java Community Process. JSR 54 specifies JDBC 3.0 (included in J2SE 1.4), JSR 114 specifies the JDBC Rowset additions, and JSR 221 is the specification of JDBC 4.0 (included in Java SE 6).

JDBC allows multiple implementations to exist and be used by the same application. The API provides a mechanism for dynamically loading the correct Java packages and registering them with the JDBC Driver Manager. The Driver Manager is used as a connection factory for creating JDBC connections.

JDBC connections support creating and executing statements. These may be update statements such as SQL's CREATE, INSERT, UPDATE and DELETE, or they may be query statements such as SELECT. Additionally, stored procedures may be invoked through a JDBC connection. JDBC represents statements using one of the following classes:

Statement – the statement is sent to the database server each and every time.

PreparedStatement – the statement is cached and then the execution path is pre determined on the database server allowing it to be executed multiple times in an efficient manner.

CallableStatement – used for executing stored procedures on the database.

The method Class.forName(String) is used to load the JDBC driver class. The line below causes the JDBC driver from some jdbc vendor to be loaded into the application. (Some JVMs also require the class to be instantiated with .newInstance().)

When a Driver class is loaded, it creates an instance of itself and registers it with the DriverManager. This can be done by including the needed code in the driver class's static block. e.g. DriverManager.registerDriver(Driver driver)

Now when a connection is needed, one of the DriverManager.getConnection() methods is used to create a JDBC connection.

The URL used is dependent upon the particular JDBC driver. It will always begin with the "jdbc:" protocol, but the rest is up to the particular vendor. Once a connection is established, a statement must be created.

Statement stmt = conn.createStatement();
try {
stmt.executeUpdate( "INSERT INTO MyTable( name ) VALUES ( 'my name' ) " );
} finally {
//It's important to close the statement when you are done with it
stmt.close();
}

Note that Connections, Statements, and ResultSets often tie up operating system resources such as sockets or file descriptors. In the case of Connections to remote database servers, further resources are tied up on the server, e.g., cursors for currently open ResultSets. It is vital to close() any JDBC object as soon as it has played its part; garbage collection should not be relied upon. Forgetting to close() things properly results in spurious errors and misbehaviour. The above try-finally construct is a recommended code pattern to use with JDBC objects.

Data is retrieved from the database using a database query mechanism. The example below shows creating a statement and executing a query.

Typically, however, it would be rare for a seasoned Java programmer to code in such a fashion. The usual practice would be to abstract the database logic into an entirely different class and to pass preprocessed strings (perhaps derived themselves from a further abstracted class) containing SQL statements and the connection to the required methods. Abstracting the data model from the application code makes it more likely that changes to the application and data model can be made independently.

An example of a PreparedStatement query, using conn and class from first example.

PreparedStatement ps = conn.prepareStatement( "SELECT i.*, j.* FROM Omega i, Zappa j WHERE i.name = ? AND j.num = ?" );
try {
// In the SQL statement being prepared, each question mark is a placeholder
// that must be replaced with a value you provide through a "set" method invocation.
// The following two method calls replace the two placeholders; the first is
// replaced by a string value, and the second by an integer value.
ps.setString(1, "Poor Yorick");
ps.setInt(2, 8008);

// The ResultSet, rs, conveys the result of executing the SQL statement.
// Each time you call rs.next(), an internal row pointer, or cursor,
// is advanced to the next row of the result. The cursor initially is
// positioned before the first row.
ResultSet rs = ps.executeQuery();
try {
while ( rs.next() ) {
int numColumns = rs.getMetaData().getColumnCount();
for ( int i = 1 ; i <= numColumns ; i++ ) {
// Column numbers start at 1.
// Also there are many methods on the result set to return
// the column as a particular type. Refer to the Sun documentation
// for the list of valid conversions.
System.out.println( "COLUMN " + i + " = " + rs.getObject(i) );
} // for
} // while
} finally {
rs.close();
}
} finally {
ps.close();
} // try

When a database operation fails, an SQLException is raised. There is typically very little one can do to recover from such an error, apart from logging it with as much detail as possible. It is recommended that the SQLException be translated into an application domain exception (an unchecked one) that eventually results in a transaction rollback and a notification to the user.

Here are examples of host database types which Java can convert to with a function.

There are commercial and free drivers available for most relational database servers. These drivers fall into one of the following types:

* Type 1 that calls native code of the locally available ODBC driver.
* Type 2 that calls database vendor native library on a client side. This code then talks to database over network.
* Type 3, the pure-java driver that talks with the server-side middleware that then talks to database
* Type 4, the pure-java driver that uses database native protocol

Internal JDBC driver, driver embedded with JRE in Java-enabled SQL databases. Used for Java stored procedures. This does not belong to the above classification, although it would likely be either a type 2 or type 4 driver (depending on whether the database itself is implemented in Java or not). An example of this is the KPRB driver supplied with Oracle RDBMS. "jdbc:default:connection" is a relatively standard way of referring making such a connection (at least Oracle and Apache Derby support it). The distinction here is that the JDBC client is actually running as part of the database being accessed, so access can be made directly rather than through network protocols.

Sources

SQLSummit.com publishes list of drivers, including JDBC drivers and vendors

Sun Microsystems provides a list of some JDBC drivers and vendors

Simba Technologies ships an SDK for building custom JDBC Drivers for any custom/proprietary relational data source

DataDirect Technologies provides a comprehensive suite of fast Type 4 JDBC drivers for all major database

Wednesday, January 6, 2010

A database system is a term that is typically used to encapsulate the constructs of a data model, database Management system (DBMS) and database.

A database is an organised pool of logically-related data. Data is stored within the data structures of the database. A DBMS is a suite of computer software providing the interface between users and a database or databases. A DBMS is a shell which surrounds a database or series of databases and through which all interactions take place with the database. The interactions catered for by most existing DBMS fall into 4 main groups:

Data Definition. Defining new data structures for a database, removing data structures from the database, modifying the structure of existing data.

Data Retrieval. Querying existing data by end-users and extracting data for use by application programs.

Data Control. Creating and monitoring users of the database, restricting access to data in the database and monitoring the performance of databases.

Both a database and its DBMS conform to the principles of a particular data model. Data models include the hierarchical data model, the network data model, the relational data model and the object-oriented data model.

A data model in software engineering is an abstract model that describes how data is represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. According to Hoberman (2009), "A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment."

A data model explicitly determines the meaning of data, which in this case is known as structured data (as opposed to unstructured data, for example an image, a binary file or a natural language text, where the meaning has to be elaborated). Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language.

Communication and precision are the two key benefits that make a data model important to applications that use and exchange data. A data model is the medium which project team members from different backgrounds and with different levels of experience can communicate with one another. Precision means that the terms and rules on a data model can be interpreted only one way and are not ambiguous.

A data model can be sometimes referred to as a data structure, especially in the context of programming languages. Data models are often complemented by function models, especially in the context of enterprise models.

A database model is a theory or specification describing how a database is structured and used. Several such models have been suggested. Common models include:

Flat model

Hierarchical model

Network model

Relational model

Concept-oriented model

Star schema

Flat model: This may not strictly qualify as a data model. The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another.

Hierarchical model: In this model data is organized into a tree-like structure, implying a single upward link in each record to describe the nesting, and a sort field to keep the records in a particular order in each same-level list.

Network model: This model organizes data using two fundamental constructs, called records and sets. Records contain fields, and sets define one-to-many relationships between records: one owner, many members.

Relational model: is a database model based on first-order predicate logic. Its core idea is to describe a database as a collection of predicates over a finite set of predicate variables, describing constraints on the possible values and combinations of values.

Object-relational model: Similar to a relational database model, but objects, classes and inheritance are directly supported in database schemas and in the query language.

Star schema is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema.

A Database Management System (DBMS) is a set of computer programs that controls the creation, maintenance, and the use of the database with computer as a platform or of an organization and its end users. It allows organizations to place control of organization-wide database development in the hands of database administrators (DBAs) and other specialists. A DBMS is a system software package that helps the use of integrated collection of data records and files known as databases. It allows different user application programs to easily access the same database. DBMSs may use any of a variety of database models, such as the network model or relational model. In large systems, a DBMS allows users and other software to store and retrieve data in a structured way. Instead of having to write computer programs to extract information, user can ask simple questions in a query language. Thus, many DBMS packages provide Fourth-generation programming language (4GLs) and other application development features. It helps to specify the logical organization for a database and access and use the information within a database. It provides facilities for controlling data access, enforcing data integrity, managing concurrency controlled, restoring database.

A DBMS is a set of software programs that controls the organization, storage, management, and retrieval of data in a database. DBMSs are categorized according to their data structures or types. The DBMS accepts requests for data from an application program and instructs the operating system to transfer the appropriate data. The queries and responses must be submitted and received according to a format that conforms to one or more applicable protocols. When a DBMS is used, information systems can be changed much more easily as the organization's information requirements change. New categories of data can be added to the database without disruption to the existing system.

Database servers are computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk arrays used for stable storage. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.

A DBMS includes four main parts: modeling language, data structure, database query language, and transaction mechanisms:

Components of DBMSDBMS Engine accepts logical request from the various other DBMS subsystems, converts them into physical equivalent, and actually accesses the database and data dictionary as they exist on a storage device.
Data Definition Subsystem helps user to create and maintain the data dictionary and define the structure of the files in a database.
Data Manipulation Subsystem helps user to add, change, and delete information in a database and query it for valuable information. Software tools within the data manipulation subsystem are most often the primary interface between user and the information contained in a database. It allows user to specify its logical information requirements.
Application Generation Subsystem contains facilities to help users to develop transactions-intensive applications. It usually requires that user perform a detailed series of tasks to process a transaction. It facilities easy-to-use data entry screens, programming languages, and interfaces.
Data Administration Subsystem helps users to manage the overall database environment by providing facilities for backup and recovery, security management, query optimization, concurrency control, and change management.

Modeling language

A data modeling language to define the schema of each database hosted in the DBMS, according to the DBMS database model. The four most common types of models are the:
hierarchical model,
network model,
relational model, and
object model.

Inverted lists and other methods are also used. A given database management system may provide one or more of the four models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements (which include transaction rate (speed), reliability, maintainability, scalability, and cost).

The dominant model in use today is the ad hoc one embedded in SQL, despite the objections of purists who believe this model is a corruption of the relational model, since it violates several of its fundamental principles for the sake of practicality and performance. Many DBMSs also support the Open Database Connectivity API that supports a standard way for programmers to access the DBMS.

Before the database management approach, organizations relied on file processing systems to organize, store, and process data files. End users became aggravated with file processing because data is stored in many different files and each organized in a different way. Each file was specialized to be used with a specific application. Needless to say, file processing was bulky, costly and nonflexible when it came to supplying needed data accurately and promptly. Data redundancy is an issue with the file processing system because the independent data files produce duplicate data so when updates were needed each separate file would need to be updated. Another issue is the lack of data integration. The data is dependent on other data to organize and store it. Lastly, there was not any consistency or standardization of the data in a file processing system which makes maintenance difficult. For all these reasons, the database management approach was produced. Database management systems (DBMS) are designed to use one of five database structures to provide simplistic access to information stored in databases. The five database structures are hierarchical, network, relational, multidimensional and object-oriented models.

The hierarchical structure was used in early mainframe DBMS. Records’ relationships form a treelike model. This structure is simple but nonflexible because the relationship is confined to a one-to-many relationship. IBM’s IMS system and the RDM Mobile are examples of a hierarchical database system with multiple hierarchies over the same data. RDM Mobile is a newly designed embedded database for a mobile computer system. The hierarchical structure is used primary today for storing geographic information and file systems.

The network structure consists of more complex relationships. Unlike the hierarchical structure, it can relate to many records and accesses them by following one of several paths. In other words, this structure allows for many-to-many relationships.

The relational structure is the most commonly used today. It is used by mainframe, midrange and microcomputer systems. It uses two-dimensional rows and columns to store data. The tables of records can be connected by common key values. While working for IBM, E.F. Codd designed this structure in 1970. The model is not easy for the end user to run queries with because it may require a complex combination of many tables.

The multidimensional structure is similar to the relational model. The dimensions of the cube looking model have data relating to elements in each cell. This structure gives a spreadsheet like view of data. This structure is easy to maintain because records are stored as fundamental attributes, the same way they’re viewed and the structure is easy to understand. Its high performance has made it the most popular database structure when it comes to enabling online analytical processing (OLAP).

The object oriented structure has the ability to handle graphics, pictures, voice and text, types of data, without difficultly unlike the other database structures. This structure is popular for multimedia Web-based applications. It was designed to work with object-oriented programming languages such as Java.