MemLoad (MemStore) are the maximum number of Load (Store) operations that the compiler is permitted to issue in the same instruction for the entire VEX target (all clusters)

Memory.<n> is the maximum number of memory (Loads or Store) operations that the compiler is permitted to issue in the same instruction for cluster <n>

So, in general it makes sense to define architectures where (foreach <n>)
MemLoad >= Memory.<n> or MemStore >= Memory.<n>

The number of memory ports for a cluster <n> is Memory.<n>, because that is the maximum number of memory operations that can be issued. If you share a single cache across all clusters (numCache=1), you need that many ports per cluster. So, the total number of read ports (for numCaches=1) is
min (MemLoad, Memory.0 + Memory.1 + Memory.2 + Memory.3)
for a 4-cluster machine (and you can work out the other cases).

For example, let's say we have a 4-cluster machine, with Memory.<n>=2 and MemLoad = 6. The number of memory ports in this case is 6, and we can issue 2 memory ops per cycle in each cluster (but up to a maximum of 3 clusters). And so on.

frb wrote:MemLoad (MemStore) are the maximum number of Load (Store) operations that the compiler is permitted to issue in the same instruction for the entire VEX target (all clusters)

Memory.<n> is the maximum number of memory (Loads or Store) operations that the compiler is permitted to issue in the same instruction for cluster <n>

So, in general it makes sense to define architectures where (foreach <n>) MemLoad >= Memory.<n> or MemStore >= Memory.<n>

The number of memory ports for a cluster <n> is Memory.<n>, because that is the maximum number of memory operations that can be issued. If you share a single cache across all clusters (numCache=1), you need that many ports per cluster. So, the total number of read ports (for numCaches=1) is min (MemLoad, Memory.0 + Memory.1 + Memory.2 + Memory.3)for a 4-cluster machine (and you can work out the other cases).

Previously you defined MemLoad as total number of memory accesses by all clusters. But still I have some doubts.Let's assume the case where we have a 4-clustered architecture with Mem.<n> = 2, MemLoad = 6 and numCaches = 2.Therefore we have Cluster0 and Cluster2 connected to Cache0 and Cluster1 and Cluster3 connected to Cache1.In this case, does it mean that Cluster0 and Cluster2 can issue 2 + 2 = 4 (< 6) memory operations on Cache0? Or 6 / 2 = 3 (MemLoad / numCaches)? In other words, is MemLoad a per-cache-parameter? What is the number of memory ports of a cache?

For example, let's say we have a 4-cluster machine, with Memory.<n>=2 and MemLoad = 6. The number of memory ports in this case is 6, and we can issue 2 memory ops per cycle in each cluster (but up to a maximum of 3 clusters). And so on.

Let's assume the case where we have a 4-clustered architecture with Mem.<n> = 2, MemLoad = 6 and numCaches = 2.Therefore we have Cluster0 and Cluster2 connected to Cache0 and Cluster1 and Cluster3 connected to Cache1.In this case, does it mean that Cluster0 and Cluster2 can issue 2 + 2 = 4 (< 6) memory operations on Cache0? Or 6 / 2 = 3 (MemLoad / numCaches)? In other words, is MemLoad a per-cache-parameter? What is the number of memory ports of a cache?

The connection to the cache "bank" itself has no limitations. The limitations are imposed by the restrictions on the compiler. In other words, in your example, each cluster could issue 2 operations, with a total maximum of 6. With a single cache bank (numCaches=1), you need 6 ports to the cache. With 2 cache banks (numCaches=2), given that each is connected to 2 clusters and that potentially they could issue 4 ops, you need 4 ports to each cache bank.

In this last case, you might argue whether that configuration makes any sense, because you're artificially crippling the compiler not to emit 8 memories, when you do need to provide the hw because of the way in which the banks are connected.

Unfortunately, this choice of parameters doesn't cover every possible combination of caches and configurations that one can imagine. To explore a configuration that cannot be expressed, you'd have to write your own cache model to replace the Vex builtin cache model. Because the simulation API allows to call user-defined functions at every memory op, this is something that is supported.