So I am wondering, is it safe to use Sender.Tell to reply to sharded entities? Would it be a problem if they sent a message expecting a reply at some point in the future and before a reply is received they'd get re-balanced onto a different node?

yeah, I have a saga that sends messages to ARs, those are sharded entities, and that is sent via a proxy so that is fine. but sagas are also sharded entities so that ack might get lost, but yeah, a) they probably don't have to be sharded entities, and b) I could fake the sender as some cluster singleton that will forward the replies to their shard proxy... interesting

Actually I simply can't trust IActorRef if it is remote (for critical things anyway). So either I make sure my entities get resurrected to resend the message (a) or I need to somehow extract information from the message that will help me to figure out where to send the reply to on the node of the recipient (b) (and this local actor would figure out where to send the message further). I think a) is much simpler in this case

yeah, the whole node lifecycle I am still not sure about, we are running in Azure Service Fabric with a lot of custom code to have it running as a stateless service, it does it's own thing when things go bleak, so will see how it behaves :)

Is it possible for sharded cluster nodes to get into a forward message loop?

For instance on one node I see:Forwarding request for shard [43] to [akka.tcp://ClusterSystem@10.233.69.239:2551/system/sharding/VehiclePositionActor#1483370170]and otherForwarding request for shard [84] to [akka.tcp://ClusterSystem@10.233.79.102:2551/system/sharding/VehiclePositionActor#2080696550]

now these nodes have a singleton queue listener that is generating these messages so it looks like they are hashing the message differently and getting stuck in a loop

This is being hosted in k8s and these nodes are being deleted and recreated whenever we push out changes. This is when we tend to see this popup