I have a broad question that is proving difficult to answer conclusively.

When you import from MSSQL, we (my co-workers and I) understand that theinitial connector communicates on port 1433 by default. However, when themap task created by sqoop imports the data to the data nodes, are the datanodes connecting to MSSQL via port 1433, or are arbitrary ports openedbetween the data nodes and the SQL Server?

We need to know because we are interested in hosting data for a variety ofclients, and need to be able to place firewall rules for our data center tomanage access to our cluster while still connecting to various environments.

> Hi all!>> I have a broad question that is proving difficult to answer conclusively.>> When you import from MSSQL, we (my co-workers and I) understand that the> initial connector communicates on port 1433 by default. However, when the> map task created by sqoop imports the data to the data nodes, are the data> nodes connecting to MSSQL via port 1433, or are arbitrary ports opened> between the data nodes and the SQL Server?>> We need to know because we are interested in hosting data for a variety of> clients, and need to be able to place firewall rules for our data center to> manage access to our cluster while still connecting to various environments.>> Thank you,> *Devin Suiter*> Jr. Data Solutions Software Engineer> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212> Google Voice: 412-256-8556 | www.rdx.com>

We need to be able to control, or at least predict, which ports are used,since we need to be able to supply credentials to a given client databaseand have the sqoop import deliver it to our cluster behind our firewall.The TaskNodes request the data from MSSQL Server listening on port 1433,but when the MSSQL Server sends the data back, is there a sqoop argument orproxy method so we can control what port the data goes back to HDFS on?According to Microsoft, winsock client calls are answered via 3-wayhandshake on a random port between 1024-5000 for data delivery. So:TaskNode requests connection SQL on 1433, SQL acks on 1433, then opens arandom port between 1024 - 5000 to the IP it just acknowledged to send thedata. For a firewall, we cannot leave all ports 1024-5000 open and stillpass vulnerability scans for compliance to our auditing bodies.*Devin Suiter*Jr. Data Solutions Software Engineer100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212Google Voice: 412-256-8556 | www.rdx.comOn Tue, Oct 1, 2013 at 1:21 PM, Abraham Elmahrek <[EMAIL PROTECTED]> wrote:

> Hey There,>> Your TaskNodes and JobTracker node will be contacting your RDBMS. Checkout> http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_connecting_to_a_database_serverfor more information.>> -Abe>>> On Tue, Oct 1, 2013 at 7:22 AM, DSuiter RDX <[EMAIL PROTECTED]> wrote:>>> Hi all!>>>> I have a broad question that is proving difficult to answer conclusively.>>>> When you import from MSSQL, we (my co-workers and I) understand that the>> initial connector communicates on port 1433 by default. However, when the>> map task created by sqoop imports the data to the data nodes, are the data>> nodes connecting to MSSQL via port 1433, or are arbitrary ports opened>> between the data nodes and the SQL Server?>>>> We need to know because we are interested in hosting data for a variety>> of clients, and need to be able to place firewall rules for our data center>> to manage access to our cluster while still connecting to various>> environments.>>>> Thank you,>> *Devin Suiter*>> Jr. Data Solutions Software Engineer>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212>> Google Voice: 412-256-8556 | www.rdx.com>>>>

I'm not sure if you can communicate this over email, but what does yourdata center setup look like? I would think that the entire hadoop clusterwould be placed behind a firewall and then clients would simply start jobs?This means that you'll need to configure your firewall to allow clients tocommunicate with the job tracker (which means allow traffic to the jobtracker port). The rest should be taken care of for you?

> Yes, that is really the problem.>> We need to be able to control, or at least predict, which ports are used,> since we need to be able to supply credentials to a given client database> and have the sqoop import deliver it to our cluster behind our firewall.> The TaskNodes request the data from MSSQL Server listening on port 1433,> but when the MSSQL Server sends the data back, is there a sqoop argument or> proxy method so we can control what port the data goes back to HDFS on?> According to Microsoft, winsock client calls are answered via 3-way> handshake on a random port between 1024-5000 for data delivery. So:> TaskNode requests connection SQL on 1433, SQL acks on 1433, then opens a> random port between 1024 - 5000 to the IP it just acknowledged to send the> data. For a firewall, we cannot leave all ports 1024-5000 open and still> pass vulnerability scans for compliance to our auditing bodies.>>> *Devin Suiter*> Jr. Data Solutions Software Engineer> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212> Google Voice: 412-256-8556 | www.rdx.com>>> On Tue, Oct 1, 2013 at 1:21 PM, Abraham Elmahrek <[EMAIL PROTECTED]> wrote:>>> Hey There,>>>> Your TaskNodes and JobTracker node will be contacting your RDBMS.>> Checkout>> http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html#_connecting_to_a_database_serverfor more information.>>>> -Abe>>>>>> On Tue, Oct 1, 2013 at 7:22 AM, DSuiter RDX <[EMAIL PROTECTED]> wrote:>>>>> Hi all!>>>>>> I have a broad question that is proving difficult to answer conclusively.>>>>>> When you import from MSSQL, we (my co-workers and I) understand that the>>> initial connector communicates on port 1433 by default. However, when the>>> map task created by sqoop imports the data to the data nodes, are the data>>> nodes connecting to MSSQL via port 1433, or are arbitrary ports opened>>> between the data nodes and the SQL Server?>>>>>> We need to know because we are interested in hosting data for a variety>>> of clients, and need to be able to place firewall rules for our data center>>> to manage access to our cluster while still connecting to various>>> environments.>>>>>> Thank you,>>> *Devin Suiter*>>> Jr. Data Solutions Software Engineer>>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212>>> Google Voice: 412-256-8556 | www.rdx.com>>>>>>>>

NEW: Monitor These Apps!

All projects made searchable here are trademarks of the Apache Software Foundation.
Service operated by Sematext