This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

Re: How to get all input tables of a SPARK SQL 'select' statement

Could be a tangential idea but might help: Why not use queryExecution and logicalPlan objects that are available when you execute a query using SparkSession and get a DataFrame back? The Json representation contains almost all the info that you need and you don't need to go to Hive to get this info.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

Could be a tangential idea but might help: Why not use
queryExecution and logicalPlan objects that are available when you execute
a query using SparkSession and get a DataFrame back? The Json representation
contains almost all the info that you need and you don't need to go to
Hive to get this info.

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda
<ramannanda9@...>
wrote:Explain extended or explain would list the plan along
with the tables. Not aware of any statements that explicitly list dependencies
or tables directly.

st 23. 1. 2019 o 10:43 <[hidden email]>
napísal(a):Hi, All,
We need to get all input tables of several SPARK SQL 'select' statements.
We can get those information of Hive SQL statements by using 'explain dependency
select....'.
But I can't find the equivalent command for SPARK SQL.
Does anyone know how to get this information of a SPARK SQL 'select' statement?
Thanks
Boying

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

答复: Re: How to get all input tables of a SPARK SQL 'select' statement

I tried the suggested approach and it
works, but it requires to 'run' the SQL statement first.

I just want to parse the SQL statement
without running it, so I can do this in my laptop without connecting to
our production environment.

I tried to write a tool which uses the
SqlBase.g4 bundled with SPARK SQL to extract names of the input tables
and it works as expected.

But I have a question:

The parser generated by SqlBase.g4 only
accepts 'select' statement with all keywords such as 'SELECT', 'FROM' and
table names capitalized e.g. it accepts 'SELECT * FROM FOO',
but it doesn't accept 'select * from foo'.

But I can run the spark.sql("select
* from foo") in the spark2-shell without any problem.

Is there another 'layer' in the SPARK
SQL to capitalize those 'tokens' before invoking the parser?

If so, why not just modify the SqlBase.g4
to accept lower cases keywords?

Could be a tangential idea but might help: Why not use
queryExecution and logicalPlan objects that are available when you execute
a query using SparkSession and get a DataFrame back? The Json representation
contains almost all the info that you need and you don't need to go to
Hive to get this info.

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda
<ramannanda9@...>
wrote:Explain extended or explain would list the plan along
with the tables. Not aware of any statements that explicitly list dependencies
or tables directly.

st 23. 1. 2019 o 10:43 <[hidden email]>
napísal(a):Hi, All,
We need to get all input tables of several SPARK SQL 'select' statements.
We can get those information of Hive SQL statements by using 'explain dependency
select....'.
But I can't find the equivalent command for SPARK SQL.
Does anyone know how to get this information of a SPARK SQL 'select' statement?
Thanks
Boying

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

I tried the suggested approach and it
works, but it requires to 'run' the SQL statement first.

I just want to parse the SQL statement
without running it, so I can do this in my laptop without connecting to
our production environment.

I tried to write a tool which uses the
SqlBase.g4 bundled with SPARK SQL to extract names of the input tables
and it works as expected.

But I have a question:

The parser generated by SqlBase.g4 only
accepts 'select' statement with all keywords such as 'SELECT', 'FROM' and
table names capitalized e.g. it accepts 'SELECT * FROM FOO',
but it doesn't accept 'select * from foo'.

But I can run the spark.sql("select
* from foo") in the spark2-shell without any problem.

Is there another 'layer' in the SPARK
SQL to capitalize those 'tokens' before invoking the parser?

If so, why not just modify the SqlBase.g4
to accept lower cases keywords?

Could be a tangential idea but might help: Why not use
queryExecution and logicalPlan objects that are available when you execute
a query using SparkSession and get a DataFrame back? The Json representation
contains almost all the info that you need and you don't need to go to
Hive to get this info.

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda
<[hidden email]>
wrote:Explain extended or explain would list the plan along
with the tables. Not aware of any statements that explicitly list dependencies
or tables directly.

st 23. 1. 2019 o 10:43 <[hidden email]>
napísal(a):Hi, All,
We need to get all input tables of several SPARK SQL 'select' statements.
We can get those information of Hive SQL statements by using 'explain dependency
select....'.
But I can't find the equivalent command for SPARK SQL.
Does anyone know how to get this information of a SPARK SQL 'select' statement?
Thanks
Boying

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

On Fri, Jan 25, 2019 at 6:07 AM <[hidden email]>
wrote:Hi, All,
I tried the suggested approach and it works, but it requires to 'run' the
SQL statement first.
I just want to parse the SQL statement without running it, so I can do
this in my laptop without connecting to our production environment.
I tried to write a tool which uses the SqlBase.g4 bundled with SPARK SQL
to extract names of the input tables and it works as expected.
But I have a question:
The parser generated by SqlBase.g4 only accepts 'select' statement with
all keywords such as 'SELECT', 'FROM' and table names capitalized
e.g. it accepts 'SELECT * FROM FOO', but it doesn't accept 'select * from
foo'.
But I can run the spark.sql("select * from foo") in the spark2-shell
without any problem.
Is there another 'layer' in the SPARK SQL to capitalize those 'tokens'
before invoking the parser?
If so, why not just modify the SqlBase.g4 to accept lower cases keywords?
Thanks
Boying

Could be a tangential idea but might help: Why not use queryExecution and
logicalPlan objects that are available when you execute a query using SparkSession
and get a DataFrame back? The Json representation contains almost all the
info that you need and you don't need to go to Hive to get this info.

On Wed, Jan 23, 2019 at 5:35 PM Ramandeep Singh Nanda <ramannanda9@...>
wrote:
Explain extended or explain would list the plan along with the tables.
Not aware of any statements that explicitly list dependencies or tables
directly.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.

This email message may contain confidential and/or privileged information.
If you are not the intended recipient, please do not read, save, forward,
disclose or copy the contents of this email or open any file attached to
this email. We will be grateful if you could advise the sender immediately
by replying this email, and delete this email and any attachment or links
to this email completely and immediately from your computer system.