I have played around with different values for -yn and -ys and but didn't perform well, the above given configuration is so far the best performance. I am not able to get the execution plan in json. I have added the image from flink ui.

while creating the cluster on aws emr, we are using below configuration

What have you tried so far to increase
performance? (Did you try different combinations of -yn and -ys?)

Can you provide us with your application? What source/sink are you
using?

On 08.08.2018 07:59, Ravi Bhushan Ratnakar wrote:

Hi Everybody,

Currently I am working on a project where i need to write a
Flink Batch Application which has to process hourly data
around 400GB of compressed sequence file. After processing, it
has write it as compressed parquet format in S3.

I have managed to write the application in Flink and able
to run successfully process the whole hour data and write in
Parquet format in S3. But the problem is this that it is not
able to meet the performance of the existing application which
is written using Spark Batch(running in production).