Database Related – Job and Interview Questions and Answershttps://www.datafusion.net
Educational ResourcesTue, 17 Jul 2018 14:11:16 +0000en-UShourly1https://wordpress.org/?v=4.4.17Difference Between Prepared Statements And Parameterized Querieshttps://www.datafusion.net/difference-between-prepared-statements-and-parameterized-queries/
https://www.datafusion.net/difference-between-prepared-statements-and-parameterized-queries/#respondFri, 18 Mar 2016 20:51:35 +0000http://www.datafusion.net/?p=252Both prepared statements and parameterized queries are exactly the same thing. Basically, the word ‘prepared statement’ is most frequently used; however, there is no difference between these two terms.

Prepared statements and parameterized queries are database management system’s characteristics that actually act as templates in which the execution of SQL occurs. The parameters are the actual values that pass through into the SQL. This is why all these templates are known as parameterized queries. Moreover, the SQL within the template is also optimized, parsed and compiled before the SQL is sent off for the execution– in other words “prepared”. Therefore, all these templates are generally also known as prepared statements. Thus, always remember that prepared statements and parameterized queries are the two names given to the same thing.

Benefits Of Using Prepared Statements

The 2 main advantages provided by the prepared statements are as follows.

First of all, they provide efficient performance. Although, the execution of a prepared statement can happen several times, it is optimized and compiled just once using the database engine.

Due to the fact that a prepared statement does not need to be optimized and compiled each and every time there is a change in the values of the query, it offers a definite performance advantage. However, remember that when a prepared statement is compiled, all the query optimization cannot occur.

This reason behind this is that the best query plan may also rely on the certain values of the parameters being passed in. Overtime, the best query plan may also change due to the fact that the indices and database tables also change with time.

How SQL Is “Prepared”?

The creation of prepared SQL requires calling the respective prepare procedure in each language.

After this, the prepared SQL template is forwarded to the DBMS (it can be DB2, MySQL or whatever) having “?”(The placeholder values) left blank.

Now, the DBMS will perform, parse and compile query optimization on the template.

After this, the DBMS will store the result; however, it cannot execute the result as it does not have any values for the execution as there is no data in the parameters/placeholders.

Once the data is passed in for the parameters and respective execute function is called, the SQL is executed.

]]>https://www.datafusion.net/difference-between-prepared-statements-and-parameterized-queries/feed/0In SQL, What Is Three-Valued (Ternary) Logic?https://www.datafusion.net/in-sql-what-is-three-valued-ternary-logic/
https://www.datafusion.net/in-sql-what-is-three-valued-ternary-logic/#respondFri, 18 Mar 2016 20:39:04 +0000http://www.datafusion.net/?p=245The answer of this question can be best explained with an example. Just suppose we have the following SQL table having the columns printerrModel and modelNumber:

Computer {

modelNumber CHAR(20) NOT NULL,

printerModelCHAR(10),

}

Now assume that the table stores entries for all the ‘makes’ of printers and PC’s – and in case, it’s a printer the printerModel field is set. With the available information, let’s try to answer a question for explaining that what a three valued logic is:

How will you write a SQL statement that returns just the PC’s and no printers from the above table?

You might be thinking that the answer of this question is very simple, and the first thing that may come to mind is the following answer:

SELECT * FROM Computer WHERE printerModel = null

Three valued logic/Ternary used by SQL

Basically, the above SQL code is not going to return anything at all – which also include the PC’s that are actually present in the table! The reason behind this is the fact that the SQL uses three-valued logic or ternary. The ternary logic concept is significant to understand for writing effective SQL queries.

SQL Logical Operations using 3 possible values

An important fact to remember is: There are 3 possible values in SQL logical operations NOT 2. But, the question arises, what are those 3 possible values? They are UNKNOWN, TRUE and FALSE. The UNKNOWN value, like its name shows, simply means that a value which is unrepresentable or unknown. If we will run the above presented SQL code, this will return UNKNOWN for a value.

The Equality Operator

The issue with the above SQL statement is the fact that we used the “=” (equality operator) in order to test a NULL column value. A comparison to NULL returns UNKNOWN in the majority of databases. This is correct even when matching NULL with NULL. The appropriate way to check for a non-NULL column or NULL is to use the IS NOT NULL or the IS NULL syntax. Thus, the SQL query must be changed to the following:

SELECT * FROM Computer WHERE printerModel IS NULL

]]>https://www.datafusion.net/in-sql-what-is-three-valued-ternary-logic/feed/0What Is A Full Table Scan In Databases?https://www.datafusion.net/what-is-a-full-table-scan-in-databases/
https://www.datafusion.net/what-is-a-full-table-scan-in-databases/#respondFri, 18 Mar 2016 20:37:05 +0000http://www.datafusion.net/?p=243A full table scan, scans one by one all the rows present in a table for finding the data that a query is searching for. Apparently, this results in very slow SQL queries when you have a table having multiple rows.

Now you can even imagine that how performance-rigorous a complete table scan would be on a table having millions of rows. However, you can help prevent full table scans by using an index. Following are a few scenarios which result in a complete table scan:

When Statistics Are Not Updated

Generally, statistics are kept on indexes and tables. However, a full table scan will happen, if for any reason index or table statistics have not been updated. The major reason behind this is that most RDBMS’s comprises of query optimizers that utilize those statistics for figuring out whether using an index is useful. And when those statistics are not present, then the RDBMS may mistakenly determine that doing a complete table scan is more beneficial as compared to using an index.

A full table scan might be performed, in case; there is no WHERE clause in a query for filtering out the rows which are shown in the result set.

Full Table Scan Using An Index

There are a few scenarios in which a complete table scan will take place even in the presence of an index on that table, such as:

A full table scan will be performed when a query does have a WHERE clause, however, not a single column in that WHERE clause is identical to the index’s leading column on the table.

A full table scan can still occur even if a query comprises of a WHERE clause having a column that is identical to the index’s first column. Generally, this situation arises when the use of an index is prevented because comparison is being used by the WHERE clause. Following are some scenarios in which that could happen:

When the wildcard operator is utilized in the comparison string’s first position.

When the NOT operator is used.

When the “<>“(NOT EQUAL) operator is used.

]]>https://www.datafusion.net/what-is-a-full-table-scan-in-databases/feed/0What Is The Concurrent Update Problem In Databases?https://www.datafusion.net/what-is-the-concurrent-update-problem-in-databases/
https://www.datafusion.net/what-is-the-concurrent-update-problem-in-databases/#respondWed, 24 Feb 2016 23:58:16 +0000http://www.datafusion.net/?p=236Multiple sessions of databases are allowed to update the data similar data at one time. This gives rise to concurrent update problem. When the user for database connects to it the session formulated are new, not refreshed. For every user or even the same user, a separate session is lodged. In the complicated boundaries of databases of your application, the event of concurrent update problem is frequent.

Problems

To prevent the concurrent update problem, locking is suggested. If prevention is not taken into consideration, numerous users tend read the database concurrently. Consequently, when transactions access similar data at one time, following problems are home to the databases:

Off-track Updates

This is the most common problem which occurs when two or more transactions are carried out at the same time. A transaction is unaware of the other transactions in the same row. Ultimately, the latest update is overwritten by the transactions in the same row. This results in loss of original data.

Uncommitted Dependency

This problem occurs when a transaction is taking place and another transaction chooses the same row as the first one, which already is being upgraded. In this way, the second transaction accesses the data which has no liability because of the upgrading session committed by the transaction in the first place.

Fluctuating Analysis

The analysis of the second transaction becomes inconsistent when it gains access to read different sets of data every time in the same row. It is also termed as “non-repeatable read”.

Phantom Reads

When a row in the range of other rows of a transaction is deleted or inserted, that’s when the problem phantom reads occur. Transactions taking place at the same time either insert or delete a row, resulting into a miscommunication of original readings of the rows.

The problems can be prevented by using database locking. It would lock the database, permitting a single user/session of transaction for upgrading. All systems of databases are prone to such breakdowns but there has to be an efficient control over concurrent update problem.

]]>https://www.datafusion.net/what-is-the-concurrent-update-problem-in-databases/feed/0Differentiation Between The Having Clause And The Group By Statementhttps://www.datafusion.net/differentiation-between-the-having-clause-and-the-group-by-statement/
https://www.datafusion.net/differentiation-between-the-having-clause-and-the-group-by-statement/#respondWed, 24 Feb 2016 23:41:52 +0000http://www.datafusion.net/?p=226In SQL, the group by statement and having clause work together when using aggregate functions like MAX, SUM, AVG, etc. The difference between the ‘group by statement’ and ‘having’ clause is best demonstrated by an example. Just suppose a table named as lab_bonus as shown below. Note that the table has several entries for labors A and B.

Lab bonus

Labors

Bonus

A

2000

B

4000

A

1000

C

1400

B

2500

If you want to calculate the total bonus that each labor received, then you have to write a SQL statement like this

select labor, sum(bonus) from lab_bonus group by labor;

The Group By Clause

In the above SQL statement, you can see that we have used the “group by” clause with the labor column. The group by clause allows you to find the sum of the bonuses for each labor. When you use the ‘sum(bonus)’ in combination with the ‘group by’ statement, this will give you the sum of all the bonuses for labors A, B, and C.

After running the above SQL, you will get this:

Labor

Sum(Bonus)

A

3000

B

6500

C

1400

Now, suppose you wanted to find the labors who received more than $2,000 bonuses in 2007. You might be thinking to write a query like this:

BAD SQL:

select labor, sum(bonus) from lab_bonus

group by labor where sum(bonus) > 2000;

The above SQL will not work. This is because the where clause does not work with aggregates, such as max, avg, sum, etc. Rather, you have to use the having clause. The having clause was inserted to SQL so that you could match aggregates to other values and see that how the ‘where’ clause can be utilized with non-aggregates. Finally, the correct SQL will appear like this

GOOD SQL:

select labor, sum(bonus) from lab_bonus

group by labor having sum(bonus) > 2000;

After running the above SQL, you will get this:

Labor

Sum(Bonus)

A

3000

B

6500

Difference Between Group By Statement And Having Clause

You can clearly see from the above example that the group by clause is used with group column(s), thus aggregates (such as MAX, SUM) can be used for finding the necessary information.

However, the having clause is used with the group by clause while comparisons required to be made using those aggregate functions, for example, to check if the SUM is greater than 2,000, as in our above example. Therefore, group by statements and the having clause is not really alternatives to each other; however they are used alongside one another.

]]>https://www.datafusion.net/differentiation-between-the-having-clause-and-the-group-by-statement/feed/0Difference Between A Subquery And A Derived Tablehttps://www.datafusion.net/difference-between-a-subquery-and-a-derived-table/
https://www.datafusion.net/difference-between-a-subquery-and-a-derived-table/#respondThu, 18 Feb 2016 03:47:10 +0000http://www.datafusion.net/?p=220Both subqueries and derived tables can look same and most of the people think they are the same thing, however there are some major differences.

Subquery

A subquery is a SELECT statement which is nested within another statement. This is why it’s known as a subquery, because it’s similar to have a query within another query. Usually, subqueries are used in the WHERE clause as a way for filtering out specific rows returned in the result set of the outer query.

For example, we have a table called labor with columns labor_name, last_name, labor_salary, and labor_number. And we also have another table called branch that has columns called manager_labor_number and branch_name.

Using all these tables as our sample data, here is what a subquery looks like:

Select labor_name

from labor

where labor_salary>

— this is a subquery:

(select avg(labor_salary)

from labor)

The SQL above will find all labors that have above average salary.

Derived Tables

Basically, a derived table is a subquery, except it is in the FROM clause of a SQL statement always. It is known as a derived table because it fundamentally functions as a table as far as the whole query is concerned.

However, remember that a derived table just exists in the query in which it is made. Therefore, derived tables are not essentially part of the database schema as they are not real tables.

The below example will help in clarifying a derived table:

select max (qualification)

from (

— this portion of the query is a derived table:

select qualification from table

) as qualification — must give derived table an alias

In the above SQL, you can see that the derived table is in the FROM part of the SQL. The results of the “Select qualification from table” query are considered to be the derived table. Besides, it is quite obvious to you how the derived table basically acts as a table from which something else is chosen.

It is necessary to include the “as qualification” text on the bottom of the above SQL; otherwise you will get an error saying “Every derived TABLE should have its own alias”.

]]>https://www.datafusion.net/difference-between-a-subquery-and-a-derived-table/feed/0What Is The Definition Of A Secondary Key?https://www.datafusion.net/what-is-the-definition-of-a-secondary-key/
https://www.datafusion.net/what-is-the-definition-of-a-secondary-key/#respondThu, 18 Feb 2016 03:44:52 +0000http://www.datafusion.net/?p=218In SQL, the term secondary key is often misunderstood. In order to get a clear idea about the concept of secondary key, it is essential to know about the other kinds of keys in the database language. There are basically three kinds of keys in SQL:

Primary Key

Foreign Key

Secondary Key

Although there is a lot of confusion seen in defining secondary key by a lot of people, the concept is surprisingly simple. Database management systems (DBMS) use sets of rows and columns for specifying the information about certain groups of people. These rows and columns are arranged together in the form of tables. These tables consist of a number of parameters or attributes. These attributes specifying certain information about people are known as keys. Now, for the sake of convenience, one of the attributes or keys is assigned as top priority key. This key will be used as the first attribute to distinguish an individual. One example of this is the Personal ID of a person. This top priority key will be termed as the ‘Primary Key’.

Secondary key is NOT foreign key:

A vast population of people is never sure how to differentiate between the secondary key and the foreign key. A foreign key is simply a primary key from a given table which appears as an attribute in another as well, such that the two tables are inter-related. For example, in the table shown above (name it table A), Personal ID is the primary key. Now if another related table, say table B, is present where personal ID is listed as an attribute, then Personal ID would be a foreign key in table B with respect to table A.

In the end, we can simply define Secondary Key in terms of database management systems as follows:

“The keys or fields in a table which have not been selected to be the primary key, but are considered to be the candidate keys for the primary key are referred to as Secondary Keys.”

]]>https://www.datafusion.net/what-is-the-definition-of-a-secondary-key/feed/0Difference Between Uncorrelated Subqueries And Correlated Subquerieshttps://www.datafusion.net/difference-between-uncorrelated-subqueries-and-correlated-subqueries/
https://www.datafusion.net/difference-between-uncorrelated-subqueries-and-correlated-subqueries/#respondThu, 18 Feb 2016 03:37:48 +0000http://www.datafusion.net/?p=211Let’s begin out with an example of what exactly an uncorrelated subquery looks like. After this we will compare uncorrelated subquery with a correlated subquery.

Uncorrelated Subquery Illustration

Here is an example of few SQL that exemplifies an uncorrelated subquery:

Select Merchant.Name from Merchant

Where Merchant.ID NOT IN (

Select Requisitions.Merchant_id from Requisitions, Client

Where Requisitions.cust_id = Client.ID

And Client.Name = ‘Samsonic’)

Don’t worry if the above example of the SQL looks scary to you. Still, it is not difficult to understand for our purpose. The subquery part of the above SQL starts after the “NOT IN” statement. The above query is an uncorrelated subquery because the subquery can be run independently of the outer query. Essentially, the subquery has no association with the outer query.

However, a correlated subquery has the opposite property. A correlated subquery can never run independently of the outer query. In the below example of a correlated subquery, you can see the difference yourself:

Example of a correlated subquery

SELECT *

FROM Labor Lab1

WHERE (1) = (

SELECT COUNT(DISTINCT(Lab2.Salary))

FROM Labor Lab2

WHERE Lab2.Salary >Lab1.Salary)

In the above correlated subquery you will notice that the inner subquery uses Lab1.Salary, but the alias Lab1 is created in the outer query. Thus, it is known as a correlated subquery, because the subquery indicates a value in its WHERE clause (in the above example, it uses a column belonging to Lab1) that is used in the outer query.

Working of a correlated query

In a correlated subquery, it is significant to understand the sequence of operations. First, a row is processed in the outer query. After this, for that specific row the subquery is executed – thus for each row processed by the outer query, the subquery is also going to be processed.

In the above correlated subquery example, every time a row is processed for Lab1, the subquery will also select that row’s value for Lab1.Salary and run. Afterwards the outer query will move on to the subsequent row, and the subquery is going to execute for that row’s value of Lab1.Salary.

]]>https://www.datafusion.net/difference-between-uncorrelated-subqueries-and-correlated-subqueries/feed/0Differences Between Non-Clustered And Clustered Indexeshttps://www.datafusion.net/differences-between-non-clustered-and-clustered-indexes/
https://www.datafusion.net/differences-between-non-clustered-and-clustered-indexes/#respondThu, 18 Feb 2016 03:36:20 +0000http://www.datafusion.net/?p=209A clustered index helps in determining the order in which the rows of the table will be stored on disk. Beside this, it basically stores row level data in the index’s leaf nodes itself. However, a non-clustered index has no impact on which the order of the rows is going to be stored.

Advantage of Using a Clustered Index

When groups of data that can be clustered are recurrently accessed by some queries, then using a clustered index is an advantage. This is because; it speeds up retrieval as the data lives close to each other on the disk. Moreover, if the data is accessed in the similar order as the clustered index, then the retrieval will be much quicker because the physical data deposited on disk is kept in the same order as the index.

Disadvantage of Using a Clustered Index

The major disadvantage of using a clustered index is that whenever a change is made to a value of an indexed column, the following possibility to re-sort rows for maintaining order is a sure performance hit.

Single Clustered Index on a Table

A clustered index governs the order in which the rows will be deposited on disk, thus having more than one clustered index on one table is not possible. Suppose if we have two clustered indexes on one table – which index would control the order in which the rows will be deposited? Since the rows of a table can just be sorted to follow only one index, having multiple clustered index is not acceptable.

Several Non-Clustered Indexes on a Table

Since there is no impact on the order in which all the rows are deposited on disk, there can be several non-clustered indexes on a table.

Non-Clustered Indexes

Non clustered indexes store both a pointer and a value to the actual row that is holding that value. Clustered indexes don’t require storing a pointer to the actual row. This is because the rows in the table are deposited on disk in the similar order as the clustered index, whereas the clustered index actually deposits the row-level data in its leaf nodes.

]]>https://www.datafusion.net/differences-between-non-clustered-and-clustered-indexes/feed/0Natural Key In SQL – Definition And Examplehttps://www.datafusion.net/natural-key-in-sql-definition-and-example/
https://www.datafusion.net/natural-key-in-sql-definition-and-example/#respondThu, 21 Jan 2016 03:01:46 +0000http://www.datafusion.net/?p=203Within the context of data warehouses and SQL, you have probably come across the phrase natural key. Actually, a natural key is a key which is made up of columns having a logical relationship within a table with other columns. Here is an example to explain you natural key in simple English.

Example of Natural Key

Suppose a table named as People. When we use the columns Address, Last_Name and First_Name together to form a key, then this will be a natural key as those columns are something that are completely natural to people, and there is also a logical relationship between all the columns that may exist in the table.

Reason Behind the Name Natural Key

You may want to know that why is it called a natural key. The reason is that the columns that fit into the key are just a part of the table naturally and also have a relationship in the table with other columns. Therefore, a natural key already present within a table– and columns do not require to be added just for the purpose of creating an “artificial” key.

Natural keys versus domain keys and business keys

Natural keys are also known as the business keys or domain keys, while all these terms mean exactly the same thing.

Surrogate keys versus Natural keys

Generally, Natural keys are compared with surrogate keys. The term surrogate literally means replacement or a substitute. The main reason that why a surrogate key is like a substitute is because it is artificial. This means that there is no logical relationship in between the column used for the surrogate key in the table.

Applications of surrogate keys

Since a surrogate is an unnatural key, thus in the most databases, these keys are just used to act as a primary key. These keys are just simple sequential numbers that can be used for uniquely identifying a row. For instance, SQL Server and Sybase both have an identity column specially meant to hold a different sequential number for each row.