Using UTL_XML.PARSEQUERY for SQL Dependency Analysis

The problem I described in chapter 4 was to find all view columns using the column UNIT_COST of the table COSTS in the SH schema. Other usages of this column (e.g. in where or order by clauses) have to be ignored. To solve this problem within the Oracle Database Server 11.2 a parser is necessary (at least I’m not aware of another solution). Even a DBA_DEPENDENCY_COLUMNS view as described in Rob van Wijk’s post is not enough to solve this problem.

However, in this particular case no custom or 3rd party parser is necessary. Oracle provides a procedure named PARSEQUERY in the PL/SQL package UTL_XML which is in fact well suited to solve this problem as I will show later. First, I’d like explain which columns should be found by a dependency analysis procedure based on some sample views.

Oracle’s sales history demo schema SH provides view named PROFITS, which is defined as follows:

View

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

CREATEORREPLACEVIEWPROFITSAS

SELECTs.channel_id,

s.cust_id,

s.prod_id,

s.promo_id,

s.time_id,

c.unit_cost,

c.unit_price,

s.amount_sold,

s.quantity_sold,

c.unit_cost*s.quantity_soldTOTAL_COST

FROMcostsc,saless

WHEREc.prod_id=s.prod_id

ANDc.time_id=s.time_id

ANDc.channel_id=s.channel_id

ANDc.promo_id=s.promo_id;

The columns using COSTS.UNIT_COSTS are highlighted.

The following view uses the column TOTAL_COSTS in GROSS_MARGIN (line 14) and GROSS_MARGIN_PERCENT (lines 14 and 15). The usage is not evident at the first glance since it is based on the column GROSS_MARGIN (line 4) of the named query GM and the column COST (line 8) in GM’s subquery. This kind of dependencies need to be identified.

View

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

CREATEORREPLACEVIEWGROSS_MARGINSAS

WITH

gmAS(

SELECTtime_id,revenue,revenue-costASgross_margin

FROM(

SELECTtime_id,

unit_price*quantity_soldASrevenue,

total_costAScost

FROMprofits

)

)

SELECTt.fiscal_year,

SUM(revenue)ASrevenue,

SUM(gross_margin)ASgross_margin,

round(100*SUM(gross_margin)/SUM(revenue),2)

ASgross_margin_percent

FROMgm

INNER JOINtimestONt.time_id=gm.time_id

GROUP BYt.fiscal_year

ORDERBYt.fiscal_year;

The next view does not present the data of COSTS.UNIT_COST as a column, even if the view depends on the table COSTS

View

1

2

3

CREATEORREPLACEVIEWREVENUESAS

SELECTfiscal_year,revenue

FROMgross_margins;

The last view uses COSTS.UNIT_COST but not as part of a column expression and therefore has not to be reported. The usage in the order by clause is considered save.

View

1

2

3

4

5

6

7

8

9

10

CREATEORREPLACEVIEWSALES_ORDERED_BY_GMAS

SELECTchannel_id,

cust_id,

prod_id,

promo_id,

time_id,

amount_sold,

quantity_sold

FROMprofits

ORDERBY(unit_price-unit_cost)DESC;

So, the following result of the dependency analysis is expected:

SCHEMA

VIEW

COLUMN

SH

PROFITS

UNIT_COST

SH

PROFITS

TOTAL_COST

SH

GROSS_MARGINS

GROSS_MARGIN

SH

GROSS_MARGINS

GROSS_MARGIN_PERCENT

Exactly this result is created by the following query

Column

1

2

SELECT*

FROMTABLE(coldep_pkg.get_dep('sh','costs','unit_cost'));

Now I just list all the code snippets I’ve written to create this result. Please note, that this is considered just a proof-of-concept code to show how UTL_XML.PARSEQUERY could be used for SQL dependency analysis in conjunction with Oracle dictionary views. This means that this is not a complete implementation. For example wild cards (*) is not handled which may lead to missing dependencies. Additionally table/view sources are not checked which may lead to false positives (in case a column is used in multiple view/table sources). – Please feel free to complete the code. However, an update is highly appreciated ;-)

Grants

1

GRANTEXECUTEONSYS.UTL_XMLTOSH;

Object

1

2

3

4

5

6

7

CREATEORREPLACETYPE"SH"."COLDEP_TYP"AS

OBJECT(schema_nameVARCHAR2(30),

view_namevarchar2(30),

column_nameVARCHAR2(30))

/

CREATEORREPLACETYPE"SH"."COLDEP_L"ISTABLEOFcoldep_typ

/

PL/SQL

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

CREATEORREPLACEPACKAGE"SH"."COLDEP_PKG"IS

FUNCTIONparse_query(p_queryINVARCHAR2)RETURNxmltype;

FUNCTIONget_dep(p_schema_nameINVARCHAR2,

p_object_nameINVARCHAR2,

p_column_nameINVARCHAR2)RETURNcoldep_l

PIPELINED;

FUNCTIONprocess_view(p_schema_nameINVARCHAR2,

p_view_nameINVARCHAR2,

p_column_nameINVARCHAR2,

p_queryINCLOB)RETURNcoldep_l;

ENDcoldep_pkg;

/

CREATEORREPLACEPACKAGEBODY"SH"."COLDEP_PKG"IS

FUNCTIONparse_query(p_queryINVARCHAR2)RETURNxmltypeIS

v_clobCLOB;

v_xmlxmltype;

BEGIN

dbms_lob.createtemporary(v_clob,TRUE);

-- parse query and get XML as CLOB

sys.utl_xml.parsequery(USER,p_query,v_clob);

-- create XMLTYPE from CLOB

v_xml:=xmltype.createxml(v_clob);

dbms_lob.freetemporary(v_clob);

RETURNv_xml;

ENDparse_query;

FUNCTIONget_dep(p_schema_nameINVARCHAR2,

p_object_nameINVARCHAR2,

p_column_nameINVARCHAR2)RETURNcoldep_l

PIPELINEDIS

BEGIN

-- query dictionary dependencies

FORv_depIN(SELECTd.ownerASschema_name,

d.nameASview_name,

v.textASquery_text

FROMall_dependenciesd

INNER JOINall_viewsv

ONv.owner=d.owner

ANDv.view_name=d.name

WHEREd.referenced_owner=upper(p_schema_name)

ANDd.referenced_name=upper(p_object_name)

ANDd.type='VIEW')

LOOP

-- process every fetched view

FORv_viewsIN(

SELECTVALUE(pv)coldep

FROMTABLE(process_view(v_dep.schema_name,

v_dep.view_name,

p_column_name,

v_dep.query_text))pv)

LOOP

-- return column usages in v_dep.view_name

PIPEROW(v_views.coldep);

-- get column usages of views using v_dep.view_name (recursive calls)

FORv_recursiveIN(

SELECTVALUE(dep)coldep

FROMTABLE(get_dep(v_views.coldep.schema_name,

v_views.coldep.view_name,

v_views.coldep.column_name))dep)

LOOP

-- return column usages of recursive call

PIPEROW(v_recursive.coldep);

END LOOP;

END LOOP;

END LOOP;

ENDget_dep;

FUNCTIONprocess_view(p_schema_nameINVARCHAR2,

p_view_nameINVARCHAR2,

p_column_nameINVARCHAR2,

p_queryINCLOB)RETURNcoldep_lIS

v_search_lcoldep_l:=coldep_l(coldep_typ(NULL,

NULL,

p_column_name));

v_xmlxmltype;

v_previous_countINTEGER:=0;

v_coldep_lcoldep_l:=coldep_l();

BEGIN

-- parse view query

v_xml:=parse_query(p_query);

-- get inline dependencies from secondary select lists

-- TODO: handle table/view source and wildcard properly

WHILEv_previous_count&lt;v_search_l.count

LOOP

v_previous_count:=v_search_l.count;

FORv_secondaryIN(

SELECTnvl(x.alias_name,x.column_reference)ASalias_name

FROM(SELECTt.select_list_item,

t.alias_name,

extractvalue(VALUE(c),'COLUMN')AScolumn_reference

FROMxmltable('//SELECT_LIST_ITEM[ancestor::FROM or ancestor::WITH]'

passingv_xml

columnsselect_list_itemxmltypepath'//SELECT_LIST_ITEM',

alias_nameVARCHAR2(30)path'//COLUMN_ALIAS')t,

TABLE(xmlsequence(extract(select_list_item,'//COLUMN')))c)x

WHEREupper(x.column_reference)IN(SELECTupper(column_name)

FROMTABLE(v_search_l))

ANDupper(alias_name)NOT IN(SELECTupper(column_name)

FROMTABLE(v_search_l)))

LOOP

-- add internal column usage

v_search_l.extend;

v_search_l(v_search_l.count):=coldep_typ(NULL,

NULL,

v_secondary.alias_name);

END LOOP;

END LOOP;

-- analyze primary select list

-- TODO: handle table/view source and wildcard properly

FORv_primaryIN(

SELECTx.column_id,atc.column_name

FROM(SELECTt.select_list_item,

t.column_id,

extractvalue(VALUE(c),'COLUMN')AScolumn_reference

FROMxmltable('//SELECT_LIST_ITEM[not (ancestor::FROM) and not (ancestor::WITH)]'

passingv_xml

columnscolumn_idFORordinality,

select_list_itemxmltypepath'//SELECT_LIST_ITEM')t,

TABLE(xmlsequence(extract(select_list_item,'//COLUMN')))c)x

INNER JOINall_tab_columnsatc

ONatc.owner=p_schema_name

ANDatc.table_name=p_view_name

ANDatc.column_id=x.column_id

WHEREupper(x.column_reference)IN(SELECTupper(column_name)

FROMTABLE(v_search_l))

ORDERBYx.column_id)

LOOP

-- add external column usage

v_coldep_l.extend;

v_coldep_l(v_coldep_l.count):=coldep_typ(p_schema_name,

p_view_name,

v_primary.column_name);

END LOOP;

-- return column dependencies

RETURNv_coldep_l;

ENDprocess_view;

ENDcoldep_pkg;

/

Below you find the XML parser output of the query defined in the view GROSS_MARGINS. The model becomes quite clear, even if I could not find a schema description.

XML

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

<QUERY>

<WITH>

<WITH_ITEM>

<QUERY_ALIAS>GM</QUERY_ALIAS>

<QUERY>

<SELECT>

<SELECT_LIST>

<SELECT_LIST_ITEM>

<COLUMN_REF>

<COLUMN>TIME_ID</COLUMN>

</COLUMN_REF>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<COLUMN_REF>

<COLUMN>REVENUE</COLUMN>

</COLUMN_REF>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<SUB>

<COLUMN_REF>

<COLUMN>REVENUE</COLUMN>

</COLUMN_REF>

<COLUMN_REF>

<COLUMN>COST</COLUMN>

</COLUMN_REF>

</SUB>

<COLUMN_ALIAS>GROSS_MARGIN</COLUMN_ALIAS>

</SELECT_LIST_ITEM>

</SELECT_LIST>

</SELECT>

<FROM>

<FROM_ITEM>

<QUERY>

<SELECT>

<SELECT_LIST>

<SELECT_LIST_ITEM>

<COLUMN_REF>

<TABLE>PROFITS</TABLE>

<COLUMN>TIME_ID</COLUMN>

</COLUMN_REF>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<MUL>

<COLUMN_REF>

<TABLE>PROFITS</TABLE>

<COLUMN>UNIT_PRICE</COLUMN>

</COLUMN_REF>

<COLUMN_REF>

<TABLE>PROFITS</TABLE>

<COLUMN>QUANTITY_SOLD</COLUMN>

</COLUMN_REF>

</MUL>

<COLUMN_ALIAS>REVENUE</COLUMN_ALIAS>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<COLUMN_REF>

<TABLE>PROFITS</TABLE>

<COLUMN>TOTAL_COST</COLUMN>

</COLUMN_REF>

<COLUMN_ALIAS>COST</COLUMN_ALIAS>

</SELECT_LIST_ITEM>

</SELECT_LIST>

</SELECT>

<FROM>

<FROM_ITEM>

<TABLE>PROFITS</TABLE>

</FROM_ITEM>

</FROM>

</QUERY>

</FROM_ITEM>

</FROM>

</QUERY>

</WITH_ITEM>

</WITH>

<SELECT>

<SELECT_LIST>

<SELECT_LIST_ITEM>

<COLUMN_REF>

<TABLE_ALIAS>T</TABLE_ALIAS>

<COLUMN>FISCAL_YEAR</COLUMN>

</COLUMN_REF>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<SUM>

<COLUMN_REF>

<COLUMN>REVENUE</COLUMN>

</COLUMN_REF>

</SUM>

<COLUMN_ALIAS>REVENUE</COLUMN_ALIAS>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<SUM>

<COLUMN_REF>

<COLUMN>GROSS_MARGIN</COLUMN>

</COLUMN_REF>

</SUM>

<COLUMN_ALIAS>GROSS_MARGIN</COLUMN_ALIAS>

</SELECT_LIST_ITEM>

<SELECT_LIST_ITEM>

<ROUND>

<DIV>

<MUL>

<LITERAL>100</LITERAL>

<SUM>

<COLUMN_REF>

<COLUMN>GROSS_MARGIN</COLUMN>

</COLUMN_REF>

</SUM>

</MUL>

<SUM>

<COLUMN_REF>

<COLUMN>REVENUE</COLUMN>

</COLUMN_REF>

</SUM>

</DIV>

<LITERAL>2</LITERAL>

</ROUND>

<COLUMN_ALIAS>GROSS_MARGIN_PERCENT</COLUMN_ALIAS>

</SELECT_LIST_ITEM>

</SELECT_LIST>

</SELECT>

<FROM>

<FROM_ITEM>

<JOIN>

<INNER/>

<JOIN_TABLE_1>

<QUERY_ALIAS>GM</QUERY_ALIAS>

</JOIN_TABLE_1>

<JOIN_TABLE_2>

<TABLE>TIMES</TABLE>

<TABLE_ALIAS>T</TABLE_ALIAS>

</JOIN_TABLE_2>

<ON>

<EQ>

<COLUMN_REF>

<TABLE>TIMES</TABLE>

<TABLE_ALIAS>T</TABLE_ALIAS>

<COLUMN>TIME_ID</COLUMN>

</COLUMN_REF>

<COLUMN_REF>

<TABLE_ALIAS>GM</TABLE_ALIAS>

<COLUMN>TIME_ID</COLUMN>

</COLUMN_REF>

</EQ>

</ON>

</JOIN>

</FROM_ITEM>

</FROM>

<GROUP_BY>

<EXPRESSION_LIST>

<EXPRESSION_LIST_ITEM>

<COLUMN_REF>

<TABLE_ALIAS>T</TABLE_ALIAS>

<COLUMN>FISCAL_YEAR</COLUMN>

</COLUMN_REF>

</EXPRESSION_LIST_ITEM>

</EXPRESSION_LIST>

</GROUP_BY>

<ORDER_BY>

<ORDER_BY_LIST>

<ORDER_BY_LIST_ITEM>

<COLUMN_REF>

<TABLE_ALIAS>T</TABLE_ALIAS>

<COLUMN>FISCAL_YEAR</COLUMN>

</COLUMN_REF>

</ORDER_BY_LIST_ITEM>

</ORDER_BY_LIST>

</ORDER_BY>

</QUERY>

Please note that UTL_XML.PARSEQUERY is suited for extended query dependency analysis only. DML may be parsed, but the resulting model is incomplete with 11.2.0.2 (e.g. clauses missing in the select statement are not included in the model, like the SET clause in an update statement). If you need to analyze PL/SQL beyond PL/Scope you still may need a 3rd party parser.

Yes, you may analyze or compare two SQL statements. The statements and their results. Based on your comparison requirements this can become a quite complex and time consuming task. Honestly I do not understand your requirements good enough to recommend how to proceed exactly to solve your problem. But I recommend to define some simple test cases with expected results (SQL Statement 1, SQL statement 2, exact comparison result) to allow others to provide an appropriate solution approach.

One way of solving the wildcard problem might be running the view sources through DBMS_UTILITY.EXPAND_SQL_TEXT() first – this will expand the wildcard columns and also recursively expand the underlying views. (The latter might not be desirable, though.)

EXPAND_SQL_TEXT() is a 12c feature, but is available on 11g too as DBMS_SQL2.EXPAND_SQL_TEXT() – undocumented, yet seems to work just fine.

Unfortunately, what coldep_pkg.get_dep() does is impact, while I discovered this blog post looking for a solution for lineage, which is the opposite problem: given the SQL query, find all table columns it depends on. Here, I’d definitely use EXPAND_SQL_TEXT() as the first step, and this post provides the techniques to do the rest – still not easy, but frankly until today I thought it was virtually impossible.