Parsing

Parsing is the first step for query processing. This module uses lexer to scan statements input by user and then creates parse tree to store the statements’ information. Parse tree is identified by parser_ID. At the beginning of procedure, the parser_ID is initialized to 1 and increased by 1 according to each statement’s commit. Parser tree is stored in PT_NODE structure which is located at parser_Node__free_lists. The stack is as bellow:

Ux_prepare()

Db_open_buffer()

Db_open_buffer_local()

Db_open_local()

Parser_create_parser()

Pt_register_parser()

‘Session’ is created at beginning of query processing. ‘Session’ saves all of processing information including parsing and semantic check. ‘Session’ includes ‘dimension’, ‘parser’, and ‘statements’ and other fields.

Dimension indicates the number of SQL statements.

Parser includes ‘buffer’, ‘statements’, ’statement_number’, ‘id’ and other fields.

Buffer points to SQL statement

and statements indicate SQL statement parsing info.

Statement_number is SQL statement count

and id stores parser_ID.

Bellow is ‘session’ whole picture.

Figure 1. Session.

We can see the detail ‘session’ structure. In our scenario, the ‘dimension’ is 1, ‘parser_id’ is 9, and ‘node_type’ is PT_SELECT. According to ‘node_type’, we can find ‘select’, ‘from’, ‘where’ segments which locate in statements.info.query.q.select.list (or from, where) as bellow.

Figure 2. Session structure.

Below is the detailed structures for ‘list’, ‘from’, and ‘where’:

Figure 3. Segment structure.

The first line is linked from ‘list’ point. Each attribute has PT_NODE structure and each attribute is connected by each other. Attribute includes ‘next’, ‘node_type’, ‘parser_id’, ‘info’ and other fields. In our scenario, the attribute ‘node_type’ is PT_DOT_ and attribute is located at ‘name’ of ‘info’ field.

The second line is ‘from’ segment. Each class also have PT_NODE structure and each class connected by next point. Class structure includes ‘next’, ‘node_type’, ‘parser_id’, ‘info’ and other fields. In our scenario, the class ‘node_type’ is PT_SPEC and class name is locate at ‘name’ of ‘parts’.

The third line is ‘where’ segment. It includes ‘next’, ‘node_type’, ‘info’ and other fields. In our scenario, the ‘node_type’ is PT_EXPR, and the ‘dot’ structure of ‘info’ indicates ‘where’ condition. ‘op’ is PT_LT which means that ‘less than’ and ‘arg1’ and ‘arg2’ are the comparative objects that gold of wish and gold of wish1.

After name resolution, the ‘statements’ structure has the fixed schema info. The ‘list’ structure has fixed attribute name such as ‘DB_year’ at ‘original’ field, ‘wish’ at ‘resolved’ field. The ‘from’ structure has fixed class info such as ‘oid_info’, ‘class_mop’, ‘object’, ‘lock’, and other info. In our scenario, ‘select’ operation needs to ‘IS_LOCK’ objects. Detail is as bellow:

Figure 4. Segment structure details.

Query Plan

After parsing and semantic checking the query statement input by user is changed to augmented parse tree which includes the catalog info. Based on augmented parse tree, query plan will be generated in this step. Before query plan generation, we need to mention some terminology for CUBRID. In CUBRID plan, there are several parts such as ‘join graph segments’, ‘join graph nodes’, ‘join graph equivalence classes’, ‘join graph edges’, ‘join graph partitions’, ‘query plan’, ‘query stmt’, etc.

Join graph segments

Segments indicate attributes and classes after ‘select’, ‘from’ and ‘where’. In our scenario, the segments are as bellow:

Join graph edges

‘term’ is a statement condition which appears after ‘where’. For only one class ‘term’, we call it ‘scan term’ and for the two or more class ‘term’, we call it ‘join term’. In our scenario, it is join term. ‘sel’ is selectivity which means that the probability of tuples to be scaned out.

Join graph terms

As mentioned in ‘join graph edges’, there are 2 types of terms: ‘scan term’ and ‘join term’.

Join graph partitions

This is for scan group. Classes in one partition will be scaned together. We will provide more detail further.

Query plan

The Query Plan is the best execution plan for the current query which was selected based on the lowest cost. In our scenario, the best query plan is as bellow: nl-join (inner join)

‘nl-join’ means nested loop join and ‘inner join’ is based on user input. In our scenario, there is no index file, so, it can just use a nested loop join. And the whole scan will execute outer loop scan for wish1 and inner loop scan for wish. In each scan, the scan method is sequential scan (sscan). For the outer loop scan, the cost can be split to ‘fixed cost’, ‘var cost’ and ‘card’. Each cost consist ‘CPU cost’ and ‘I/O cost’. ‘card’ means expected tuple count. Two class’s ‘I/O cost’ of ‘var cost’ are 1.0 according to the distribution of page count. And ‘wish1’ expects 2 ‘card’ and ‘wish’ expect 3 ‘card’. So set the less tuple class to outer loop considers of the less disk access cost.

‘QO_ENV’ is the repository of all optimizer data structures. ‘QO_ENV’ includes ‘PT_PARSER’ which contains parsing info, ‘PT_NODE, ‘QO_SEGMENT’, ‘QO_NODE’ which is related with query graph, ‘QO_EQCLASS’, ‘QO_TERM’, ‘QO_PLAN’ tree which is the final execution plan.

In QO_ENV’s init steps, parser point and pt_tree point will be set. Nnodes is the count of classes which indicated after ‘from’, nodes is assigned to classes indicated. Nsegs set to couple of Nnode and segs is assigned to QO_SEGMENT. Nterms is count of terms which are indicated after ‘where’, and terms are assigned to QO_TERM. After init the QO_ENV is as bellow:

Figure 5. After inited QO_ENV.

QO_PLAN

QO_PLAN is generated after query optimization. In QO_PLANsarged_terms is scan condition. Vtb1 is pointed to functions which handle temporary result tables. In QO_PLAN, ‘plan_type’ indicates QO_PLAN’s type which has union value of QO_PLANTYPE_SCAN, QO_PLANTYPE_JOIN, QO_PLANTYPE_SORT, QO_PLANTYPE_FOLLOW, QO_PLANTYPE_WORST. When plan_type is QO_PLANTYPE_SCAN, we call this QO_PLAN as scan node which indicates the way of one class scan. In our scenario ‘plan_type’ is ‘QO_PLANTYPE_JOIN’.

The plan_un is scan type and when the value is sequential scan we set it as QO_SCANTYPE_SEG_SCAN, and when the value is index scan we set it as QO_SCANTYPE_INDEX_SCAN. When the plan type is QO_PLANTYPE_JOIN, we call the QO_PLAN as join node. Join node’s plan_type is Join type which value is union of QO_JOINTYPE_NL_JOIN, QO_JOINTYPE_MERGE_JOIN. And outer is the pointer of outer loop class and inner is the pointer of inner loop. Join_term is scan condition for two QO_PLAN’s join result. The QO_PLAN tree structure for our scenario is as bellow:

Figure 6. QO_PLAN tree structure.

XASL Generation

XASL_NODE

The XASL_NODE tree structure for scenario is as bellow:

Figure 7. XASL_NODE tree structure.

Query Manager

Query Manager execute XASL tree from client. In query processing, client send XASL tree to server through socket. Server execute statement through XASL tree and in case of select statement, the execution result will be stored in the list file.

Qexec_execute_mainblock() is the key function to perform scan process. In this key function, the XASL tree content will be analyzed to generate SCAN_ID which is sent to storage function to read data.