RE: What are the alternatives to nested DataFrames?

Shabad , I am not sure what you are trying to say. Could you please give me an example? The result of the Query is a Dataframe that is created after iterating, so I am not sure how could I map that to a column without iterating and getting the values.

I have a Dataframe that contains a list of cities for which I would like to iterate over and search in Elasticsearch. This list is stored in Dataframe because it contains hundreds of thousands of elements with multiple properties that would not fit in a single machine.

The issue is that the elastic-spark connector returns a Dataframe as well which leads to a dataframe creation within a Dataframe

The only solution I found is to store the list of cities in a a regular scala Seq and iterate over that, but as far as I know this would make Seq centralized instead of distributed (run at the executor only?)

Re: What are the alternatives to nested DataFrames?

Shabad , I am not sure what you are trying to say. Could you please give me an example? The result of the Query is a Dataframe that is created after iterating, so I am not sure how could I map that to a column without iterating and getting the values.

I have a Dataframe that contains a list of cities for which I would like to iterate over and search in Elasticsearch. This list is stored in Dataframe because it contains hundreds of thousands of elements with multiple properties that would not fit in a single machine.

The issue is that the elastic-spark connector returns a Dataframe as well which leads to a dataframe creation within a Dataframe

The only solution I found is to store the list of cities in a a regular scala Seq and iterate over that, but as far as I know this would make Seq centralized instead of distributed (run at the executor only?)

Shabad , I am not sure what you are trying to say. Could you please give me an example? The result of the Query is a Dataframe that is created after iterating, so I am not sure how could I map that to a column without iterating and getting the values.

I have a Dataframe that contains a list of cities for which I would like to iterate over and search in Elasticsearch. This list is stored in Dataframe because it contains hundreds of thousands of elements with multiple properties that would not fit in a single machine.

The issue is that the elastic-spark connector returns a Dataframe as well which leads to a dataframe creation within a Dataframe

The only solution I found is to store the list of cities in a a regular scala Seq and iterate over that, but as far as I know this would make Seq centralized instead of distributed (run at the executor only?)

Shabad , I am not sure what you are trying to say. Could you please give me an example? The result of the Query is a Dataframe that is created after iterating, so I am not sure how could I map that to a column without iterating and getting the values.

I have a Dataframe that contains a list of cities for which I would like to iterate over and search in Elasticsearch. This list is stored in Dataframe because it contains hundreds of thousands of elements with multiple properties that would not fit in a single machine.

The issue is that the elastic-spark connector returns a Dataframe as well which leads to a dataframe creation within a Dataframe

The only solution I found is to store the list of cities in a a regular scala Seq and iterate over that, but as far as I know this would make Seq centralized instead of distributed (run at the executor only?)

Shabad , I am not sure what you are trying to say. Could you please give me an example? The result of the Query is a Dataframe that is created after iterating, so I am not sure how could I map that to a column without iterating and getting the values.

I have a Dataframe that contains a list of cities for which I would like to iterate over and search in Elasticsearch. This list is stored in Dataframe because it contains hundreds of thousands of elements with multiple properties that would not fit in a single machine.

The issue is that the elastic-spark connector returns a Dataframe as well which leads to a dataframe creation within a Dataframe

The only solution I found is to store the list of cities in a a regular scala Seq and iterate over that, but as far as I know this would make Seq centralized instead of distributed (run at the executor only?)