2 Answers
2

Most time they communicate via memory or nearest shared memory hierarchy level. (System memory both on SMP and NUMA is considered as shared level; even if in NUMA it is accesses via memory controller of another chip. this is just Non-Uniform=slower access)

2) How fast would two cores on the same chip communicate?

Cores on same chip usually shares L2 or L3 cache. Cores on different chips communicate via memory or with cache-to-cache interactions using cache coherency protocol.

So in case 1 (different chips) speed (bandwidth) of memory passing between CPUs will be near plain memory read/write. And in case 2 (same chip) this speed can be bigger, up to cache read/write speed.

Latency of communication will be several hundreds of CPU ticks in case 1 and several dozens in case 2.

3) Are the four cores on the same chip equivalent in terms of communicating or memory accessing?

All four cores of same chip usually have equivalent distance to RAM. It depends on chip architecture and implementation; for some older Intels e.g. multicore chip was really two chips packed into single package.

How to schedule threads to cores for close to optimum memory performance depends on the access pattern to memory, and is usually not worth the trouble. If your program is in Java, you are probably not going to have the level of control required to get close to optimum performance.

Modern CPUs have integrated memory controllers, and modern multi-socket systems have distributed memory. This is called

An access to memory that misses in the Level 1 data cache might be serviced by the Level 2 data cache (in the same socket) or it might be serviced by what Intel calls the "Last Level Cache (LLC)" which would be in the socket that has the memory controller for that memory address. Hitting in the LLC in another socket could be a few tens of processor cycles, but still much faster than accessing DRAM (more than one hundred processor cycles).