Re: Problems with broadcast large datastructure

I have occurred the same problem with you .

I have a node of 20 machines, and I just run the broadcast example, what I do is just change the data size in the example, to 400M, this is really a small data size. but I occurred the same problem with you .

Re: Problems with broadcast large datastructure

400MB isn't really that big. Broadcast is expected to work with several GB of data and in even larger clusters (100s of machines).

if you are using the default HttpBroadcast, then akka isn't used to move the broadcasted data. But block manager can run out of memory if you repetitively broadcast large objects. Another scenario is that the master isn't receiving any heartbeats from the blockmanager because the control messages are getting dropped due to bulk data movement. Can you provide a bit more details on your network setup?

Also, you can try doing a binary search over the size of broadcasted data to see at what size it breaks (i.e, try to broadcast 10, then 20, then 40 etc etc.)? Also, limit each run to a single iteration in the example (right now, it tries to broadcast 3 consecutive times).

If you are using a newer branch, you can also try the new TorrentBroadcast implementation.

Re: Problems with broadcast large datastructure

Oh, I misleading by the following log info, that I thought the broadcast variable is send back to driver. then the sending result to driver has no relationship with the broadcast variable, but what it is , since there seem no data will send back?

org.apache.spark.executor.Executor - Serialized size of result for 1901 is 446

org.apache.spark.executor.Executor - Sending result for 1901 directly to driver

btw, For the 30 times. every time I just run the third or fourth iteration, then spark get stuck, and when I use the TorrentBroadcast, it get stuck in the second iteration , can the blockManager use the disk to store the data when there is no memory ? and is there any good method to see the memory usage in blockManager , or maybe I can read the data from files to avoid such problem.