Second way: returning a UDFAnother way of writing the UDF is you can write a function returning a UDF. Pay attention to rename_udf()("features"), because the rename_udf function returning a UDF. Then this UDF will be executed with the column features passing into it. That’s why we needs ()("features")

More complicated example with passing the broadcast variable

one_hot_encoding() take every single Row, and transform it into one-hot-encoding value.

to_ohe() is an UDF, it take every single Row, and call the one_hot_encoding() function on that row. And then it takes the returned value, and make a new dataframe based on the returned value for every single row.