margo issueshttps://xgitlab.cels.anl.gov/sds/margo/issues2020-05-17T11:13:05-05:00https://xgitlab.cels.anl.gov/sds/margo/issues/62Margo is not cleaning up Mercury properly2020-05-17T11:13:05-05:00Matthieu DorierMargo is not cleaning up Mercury properly`margo_cleanup`, which is used either in `margo_finalize` or `margo_wait_for_finalize`, is not destroying the instance's `hg_context` and `hg_class`. At best this causes a leak. At worst, when using a protocol like `na+sm`, it makes it impossible to finalize margo and re-initialize it later on.`margo_cleanup`, which is used either in `margo_finalize` or `margo_wait_for_finalize`, is not destroying the instance's `hg_context` and `hg_class`. At best this causes a leak. At worst, when using a protocol like `na+sm`, it makes it impossible to finalize margo and re-initialize it later on.https://xgitlab.cels.anl.gov/sds/margo/issues/60the margo_bulk_free() function doesn&#39;t have a mid (margo instance) argument2020-05-13T10:28:34-05:00Philip Carnscarns@mcs.anl.govthe margo_bulk_free() function doesn't have a mid (margo instance) argumentUnlike most other margo functions (most notably the margo_bulk_create() function that is naturally paired with margo_bulk_free()), margo_bulk_free() does not have a margo instance argument.
This makes it impossible to associate activity (e.g. diagnostic counters) in that function with the corresponding margo instance.
There might not be any reasonable way to fix it; the API function is widely used.Unlike most other margo functions (most notably the margo_bulk_create() function that is naturally paired with margo_bulk_free()), margo_bulk_free() does not have a margo instance argument.
This makes it impossible to associate activity (e.g. diagnostic counters) in that function with the corresponding margo instance.
There might not be any reasonable way to fix it; the API function is widely used.https://xgitlab.cels.anl.gov/sds/margo/issues/59Enable remote shutdown from same process2020-03-30T05:44:05-05:00Matthieu DorierEnable remote shutdown from same processIf a process is initialized as a server and called `margo_enable_remote_shutdown`, then this same process issues a `margo_shutdown_remote_instance`, Mercury will complain about an `hg_addr_t` not being freed. The instance in question is the `hg_addr_t` being passed to the shutdown RPC itself, which cannot be freed anymore when the RPC returns because Mercury has already finalized.
The fix should be to detect that `margo_shutdown_remote_instance` is being invoked on the calling process itself, and call `margo_finalize` instead of sending an RPC.If a process is initialized as a server and called `margo_enable_remote_shutdown`, then this same process issues a `margo_shutdown_remote_instance`, Mercury will complain about an `hg_addr_t` not being freed. The instance in question is the `hg_addr_t` being passed to the shutdown RPC itself, which cannot be freed anymore when the RPC returns because Mercury has already finalized.
The fix should be to detect that `margo_shutdown_remote_instance` is being invoked on the calling process itself, and call `margo_finalize` instead of sending an RPC.https://xgitlab.cels.anl.gov/sds/margo/issues/56Check overhead of breadcrumb2020-02-20T07:53:09-06:00Matthieu DorierCheck overhead of breadcrumbI noticed here https://xgitlab.cels.anl.gov/sds/margo/blob/master/src/margo.c#L1249 that the breadcrumb mechanism requires converting an address into a string and hashing this string within the `margo_forward` path, which may add significant overhead in RPC. This overhead would need to be checked and, if necessary, an alternate mechanism should be put in place (or we should be able to disable breadcrumbs).I noticed here https://xgitlab.cels.anl.gov/sds/margo/blob/master/src/margo.c#L1249 that the breadcrumb mechanism requires converting an address into a string and hashing this string within the `margo_forward` path, which may add significant overhead in RPC. This overhead would need to be checked and, if necessary, an alternate mechanism should be put in place (or we should be able to disable breadcrumbs).Srinivasan Rameshsramesh@cs.uoregon.eduSrinivasan Rameshsramesh@cs.uoregon.eduhttps://xgitlab.cels.anl.gov/sds/margo/issues/55margo-gen-profile not compatible with python32020-02-19T19:49:00-06:00Chris Hoganmargo-gen-profile not compatible with python3The file mixes tabs and spaces, which is fine for python2, but breaks for python3.The file mixes tabs and spaces, which is fine for python2, but breaks for python3.Srinivasan Rameshsramesh@cs.uoregon.eduSrinivasan Rameshsramesh@cs.uoregon.eduhttps://xgitlab.cels.anl.gov/sds/margo/issues/54busy-spin latency on cooly really high2020-03-05T16:11:44-06:00Rob Lathambusy-spin latency on cooly really highThis command takes longer than 20 minutes to run on Cooley:
mpirun -f $COBALT_NODEFILE -n 2 numactl -N 1 -m 1 ./margo-p2p-latency -i 100000 -n "ofi+verbs;ofi_rxm://mlx5_0:3339" -t 0,0
The corresponding test in the no-spin case is shorter:
mpirun -f $COBALT_NODEFILE -n 2 numactl -N 1 -m 1 ./margo-p2p-latency -i 100000 -n "verbs://"
# <op> <iterations> <warmup_iterations> <size> <min> <q1> <med> <avg> <q3> <max>
noop 100000 100 0 0.000017643 0.000020266 0.000020742 0.000023233 0.000026464 0.000243187
Not sure when that first started happening, but has been going on since the nightly tests got going again in NovemberThis command takes longer than 20 minutes to run on Cooley:
mpirun -f $COBALT_NODEFILE -n 2 numactl -N 1 -m 1 ./margo-p2p-latency -i 100000 -n "ofi+verbs;ofi_rxm://mlx5_0:3339" -t 0,0
The corresponding test in the no-spin case is shorter:
mpirun -f $COBALT_NODEFILE -n 2 numactl -N 1 -m 1 ./margo-p2p-latency -i 100000 -n "verbs://"
# <op> <iterations> <warmup_iterations> <size> <min> <q1> <med> <avg> <q3> <max>
noop 100000 100 0 0.000017643 0.000020266 0.000020742 0.000023233 0.000026464 0.000243187
Not sure when that first started happening, but has been going on since the nightly tests got going again in Novemberhttps://xgitlab.cels.anl.gov/sds/margo/issues/53sparkline_data_collection_fn() using ABT_thread_yield() instead of margo_thre...2020-03-06T07:38:26-06:00Philip Carnscarns@mcs.anl.govsparkline_data_collection_fn() using ABT_thread_yield() instead of margo_thread_sleep()Low priority to come back to this later. The ULT that runs sparkline data collection needs to idle most of the time, which seems like a sensible use of margo_thread_sleep(). There is a bug that prevents that from working correctly that we need to dig into.
Right now the code is using ABT_thread_yield() instead, which could result in the sparkline ULT being scheduled too frequently in some cases.
This issue is not relevant in master until the breadcrumb MR from @sramesh is merged.Low priority to come back to this later. The ULT that runs sparkline data collection needs to idle most of the time, which seems like a sensible use of margo_thread_sleep(). There is a bug that prevents that from working correctly that we need to dig into.
Right now the code is using ABT_thread_yield() instead, which could result in the sparkline ULT being scheduled too frequently in some cases.
This issue is not relevant in master until the breadcrumb MR from @sramesh is merged.https://xgitlab.cels.anl.gov/sds/margo/issues/52Margo bulk pool segfaulting at finalize time2020-04-09T08:52:29-05:00Matthieu DorierMargo bulk pool segfaulting at finalize timeI'm using Bake's pipelining mode (calling `bake_provider_set_config` to set "pipeline_enabled" to "1"). When the provider finalizes, I'm getting a segfault with the following stack trace:
```
#0 0x00007fb8e1444b12 in margo_bulk_pool_destroy (pool=0x4) at src/margo-bulk-pool.c:127
#1 0x00007fb8e1444f2d in margo_bulk_poolset_destroy (poolset=0x561853aef040) at src/margo-bulk-pool.c:290
#2 0x00007fb8e14e3017 in bake_server_finalize_cb (data=data@entry=0x5618515896b0) at ../src/bake-server.c:1840
#3 0x00007fb8e14e3067 in bake_provider_destroy (provider=0x5618515896b0) at ../src/bake-server.c:376
```I'm using Bake's pipelining mode (calling `bake_provider_set_config` to set "pipeline_enabled" to "1"). When the provider finalizes, I'm getting a segfault with the following stack trace:
```
#0 0x00007fb8e1444b12 in margo_bulk_pool_destroy (pool=0x4) at src/margo-bulk-pool.c:127
#1 0x00007fb8e1444f2d in margo_bulk_poolset_destroy (poolset=0x561853aef040) at src/margo-bulk-pool.c:290
#2 0x00007fb8e14e3017 in bake_server_finalize_cb (data=data@entry=0x5618515896b0) at ../src/bake-server.c:1840
#3 0x00007fb8e14e3067 in bake_provider_destroy (provider=0x5618515896b0) at ../src/bake-server.c:376
```Philip Carnscarns@mcs.anl.govPhilip Carnscarns@mcs.anl.govhttps://xgitlab.cels.anl.gov/sds/margo/issues/49Multiple calls to margo_init2019-09-04T07:41:49-05:00Matthieu DorierMultiple calls to margo_initWe should be able to call margo_init multiple times, rather than relying on margo_init_pool if we need secondary margo instances.We should be able to call margo_init multiple times, rather than relying on margo_init_pool if we need secondary margo instances.https://xgitlab.cels.anl.gov/sds/margo/issues/48margo poolset does not free free buffers properly2019-03-15T13:42:15-05:00Philip Carnscarns@mcs.anl.govmargo poolset does not free free buffers properlyReproduceable with bake-p2p-bw benchmark from sds-tests repo when executed with -i option. On InfiniBand systems it produces an error on shutdown because verbs is stricter about resource leaks on shutdown.
When the poolset is destroyed, it calls margo_bulk_pool_destroy() for each pool in the set. This function exits early here before actually destroying any buffers:
https://xgitlab.cels.anl.gov/sds/margo/blob/master/src/margo-bulk-pool.c#L127Reproduceable with bake-p2p-bw benchmark from sds-tests repo when executed with -i option. On InfiniBand systems it produces an error on shutdown because verbs is stricter about resource leaks on shutdown.
When the poolset is destroyed, it calls margo_bulk_pool_destroy() for each pool in the set. This function exits early here before actually destroying any buffers:
https://xgitlab.cels.anl.gov/sds/margo/blob/master/src/margo-bulk-pool.c#L127Philip Carnscarns@mcs.anl.govPhilip Carnscarns@mcs.anl.govhttps://xgitlab.cels.anl.gov/sds/margo/issues/47Handle caching is incompatible with shared memory routing2019-02-12T11:10:19-06:00Marc VefHandle caching is incompatible with shared memory routingNote: This issue only applies to situations where automatic shared memory routing is used.
tl;dr: Margo handle cache is initialized with handles of the user-defined RPC protocol. When `margo_create()` is called with an na+sm address, it causes `margo_handle_cache_get()` to always fail because `HG_Reset()` expects an na+sm handle which margo's handle cache never provides. Therefore, local communication never benefits from margo's handle cache.
For the explanation below we exemplarily use ofi+tcp as the main RPC protocol which was used to initialize the margo client. In addition, `auto_sm` is enabled. The margo server is started on the same machine and accepts ofi+tcp connections as well as implicitly na+sm connections because `auto_sm` is enabled as well. Since we only use one node in this example na+sm is always used automatically.
Each time `margo_create()` is called, margo acquires a handle which has been initialized in `margo_handle_cache_init()` with ofi+tcp from the handle cache. This handle is then used with `HG_Reset()` and with an address which was automatically set to na+sm by `margo_addr_lookup()` earlier due to automatic shared memory routing in mercury. Therefore, `HG_Reset()` is called with an ofi+tcp handle and an na+sm address. Because `HG_Reset()` can only reuse handles when its address type doesn't change, it causes the following mercury error as the handle cannot be reset with the given address:
```
# HG -- Error -- /home/evie/adafs/git/mercury/src/mercury_core.c:4570
# HG_Core_reset(): Cannot reset handle to a different address NA class
# HG -- Error -- /home/evie/adafs/git/mercury/src/mercury.c:1966
# HG_Reset(): Could not reset core HG handle
```
Then margo proceeds with putting the same handle back into the margo cache and calling `HG_Create()` to manually create a shared memory handle. Once this handle will be destroyed in `margo_destroy()`, it will also be discarded and not used in the cache because it was manually created.
This cycle then repeats forever and no handle in margo's cache can ever be used for na+sm communication, accompanied with above error message in each call.
A possible solution would be to use two handle caches if `auto_sm` is enabled and then use the correct cache based on the incoming address.Note: This issue only applies to situations where automatic shared memory routing is used.
tl;dr: Margo handle cache is initialized with handles of the user-defined RPC protocol. When `margo_create()` is called with an na+sm address, it causes `margo_handle_cache_get()` to always fail because `HG_Reset()` expects an na+sm handle which margo's handle cache never provides. Therefore, local communication never benefits from margo's handle cache.
For the explanation below we exemplarily use ofi+tcp as the main RPC protocol which was used to initialize the margo client. In addition, `auto_sm` is enabled. The margo server is started on the same machine and accepts ofi+tcp connections as well as implicitly na+sm connections because `auto_sm` is enabled as well. Since we only use one node in this example na+sm is always used automatically.
Each time `margo_create()` is called, margo acquires a handle which has been initialized in `margo_handle_cache_init()` with ofi+tcp from the handle cache. This handle is then used with `HG_Reset()` and with an address which was automatically set to na+sm by `margo_addr_lookup()` earlier due to automatic shared memory routing in mercury. Therefore, `HG_Reset()` is called with an ofi+tcp handle and an na+sm address. Because `HG_Reset()` can only reuse handles when its address type doesn't change, it causes the following mercury error as the handle cannot be reset with the given address:
```
# HG -- Error -- /home/evie/adafs/git/mercury/src/mercury_core.c:4570
# HG_Core_reset(): Cannot reset handle to a different address NA class
# HG -- Error -- /home/evie/adafs/git/mercury/src/mercury.c:1966
# HG_Reset(): Could not reset core HG handle
```
Then margo proceeds with putting the same handle back into the margo cache and calling `HG_Create()` to manually create a shared memory handle. Once this handle will be destroyed in `margo_destroy()`, it will also be discarded and not used in the cache because it was manually created.
This cycle then repeats forever and no handle in margo's cache can ever be used for na+sm communication, accompanied with above error message in each call.
A possible solution would be to use two handle caches if `auto_sm` is enabled and then use the correct cache based on the incoming address.https://xgitlab.cels.anl.gov/sds/margo/issues/46make check failure on timeout.sh test2019-06-24T07:42:32-05:00Philip Carnscarns@mcs.anl.govmake check failure on timeout.sh test```
PASS: tests/sleep.sh
PASS: tests/basic.sh
PASS: tests/basic-ded-pool.sh
FAIL: tests/timeout.sh
============================================================================
Testsuite summary for margo 0.4
============================================================================
# TOTAL: 4
# PASS: 3
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
```
This is with current git master of margo and Mercury 1.0.0. Margo is using the na sm transport. The test log shows the following:
```
FAIL: tests/timeout.sh
======================
+ '[' -z .. ']'
+ '[' -z mktemp ']'
+ source ../tests/test-util.sh
++ '[' -z timeout ']'
++ mktemp --tmpdir test-XXXXXX
+ TMPOUT=/tmp/test-wAbFfe
+ test_start_servers 1 2 8
+ nservers=1
+ startwait=2
+ maxtime=8s
+ repfactor=0
+ pid=8274
++ seq 1 1
+ for i in `seq 1 $nservers`
++ mktemp
+ hostfile=/tmp/tmp.6XB63Wt3oj
+ '[' 0 -ne 0 ']'
+ sleep 2
+ timeout --signal=9 8s tests/margo-test-server na+sm:// -s -f /tmp/tmp.6XB63Wt3oj
++ cat /tmp/tmp.6XB63Wt3oj
+ svr1=na+sm://8280/0
+ sleep 1
+ run_to 10 tests/margo-test-client-timeout na+sm://8280/0
+ '[' 134 -ne 0 ']'
+ wait
``````
PASS: tests/sleep.sh
PASS: tests/basic.sh
PASS: tests/basic-ded-pool.sh
FAIL: tests/timeout.sh
============================================================================
Testsuite summary for margo 0.4
============================================================================
# TOTAL: 4
# PASS: 3
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
```
This is with current git master of margo and Mercury 1.0.0. Margo is using the na sm transport. The test log shows the following:
```
FAIL: tests/timeout.sh
======================
+ '[' -z .. ']'
+ '[' -z mktemp ']'
+ source ../tests/test-util.sh
++ '[' -z timeout ']'
++ mktemp --tmpdir test-XXXXXX
+ TMPOUT=/tmp/test-wAbFfe
+ test_start_servers 1 2 8
+ nservers=1
+ startwait=2
+ maxtime=8s
+ repfactor=0
+ pid=8274
++ seq 1 1
+ for i in `seq 1 $nservers`
++ mktemp
+ hostfile=/tmp/tmp.6XB63Wt3oj
+ '[' 0 -ne 0 ']'
+ sleep 2
+ timeout --signal=9 8s tests/margo-test-server na+sm:// -s -f /tmp/tmp.6XB63Wt3oj
++ cat /tmp/tmp.6XB63Wt3oj
+ svr1=na+sm://8280/0
+ sleep 1
+ run_to 10 tests/margo-test-client-timeout na+sm://8280/0
+ '[' 134 -ne 0 ']'
+ wait
```low priorityhttps://xgitlab.cels.anl.gov/sds/margo/issues/45Conflict between margo_ref_incr and handle cache2019-06-24T07:45:10-05:00Matthieu DorierConflict between margo_ref_incr and handle cacheScenario:
- call `margo_create(&h)` to create a handle;
- call `margo_ref_incr(h)` to increment its internal refcount;
- call `margo_destroy(h)` to destroy it;
The call to `margo_destroy` should decrease the reference count, *NOT* put the handle back in the cache. Otherwise when the handle is pulled back from the cache to be reused, `HG_Reset` will fail.
This problem appears in Thallium, in particular. It is harmless, _to some extents only_: when resetting fails, margo falls back to creating a new handle from scratch, so we just get some Mercury-related errors on stderr and the cache quickly fills up with handles that cannot be reset.
The solution would be to have Jerome provide a `HG_Get_refcount` in Mercury so that margo can decide whether `HG_Destroy` or `cache_put` should be called within `margo_destroy`.Scenario:
- call `margo_create(&h)` to create a handle;
- call `margo_ref_incr(h)` to increment its internal refcount;
- call `margo_destroy(h)` to destroy it;
The call to `margo_destroy` should decrease the reference count, *NOT* put the handle back in the cache. Otherwise when the handle is pulled back from the cache to be reused, `HG_Reset` will fail.
This problem appears in Thallium, in particular. It is harmless, _to some extents only_: when resetting fails, margo falls back to creating a new handle from scratch, so we just get some Mercury-related errors on stderr and the cache quickly fills up with handles that cannot be reset.
The solution would be to have Jerome provide a `HG_Get_refcount` in Mercury so that margo can decide whether `HG_Destroy` or `cache_put` should be called within `margo_destroy`.https://xgitlab.cels.anl.gov/sds/margo/issues/44Expose new mercury init function2018-12-07T09:02:55-06:00Tommaso TocciExpose new mercury init functionMercury added a new initialization fuction that allows to pass more configurations. In particular thrugh the new `struct hg_init_info` param, the auto shared memory routing can be enable.
At the moment margo doesn't expose this initialization method and thus doesn't provide any way of enabling automatic shared memory routing.
I cannot fork this repository and thus I cannot open a merge request. In any case you can use this patch:
[0001-Add-hg_init_info-to-margo-init.patch](/uploads/e9cb4eb2e73ccc1f19b04176b32a064f/0001-Add-hg_init_info-to-margo-init.patch)Mercury added a new initialization fuction that allows to pass more configurations. In particular thrugh the new `struct hg_init_info` param, the auto shared memory routing can be enable.
At the moment margo doesn't expose this initialization method and thus doesn't provide any way of enabling automatic shared memory routing.
I cannot fork this repository and thus I cannot open a merge request. In any case you can use this patch:
[0001-Add-hg_init_info-to-margo-init.patch](/uploads/e9cb4eb2e73ccc1f19b04176b32a064f/0001-Add-hg_init_info-to-margo-init.patch)https://xgitlab.cels.anl.gov/sds/margo/issues/43Update argobots dependency2018-11-06T12:55:28-06:00Tommaso TocciUpdate argobots dependencysince https://github.com/pmodels/argobots/pull/54 has been merged on the offcial argobot repository the README should be updated to points to the official repo instead of the @carns one.
Am I wrong?since https://github.com/pmodels/argobots/pull/54 has been merged on the offcial argobot repository the README should be updated to points to the official repo instead of the @carns one.
Am I wrong?https://xgitlab.cels.anl.gov/sds/margo/issues/42Checking for RPC name hash collision2018-04-05T10:36:57-05:00Matthieu DorierChecking for RPC name hash collisionWe should check for hash collision when registering RPCs.We should check for hash collision when registering RPCs.https://xgitlab.cels.anl.gov/sds/margo/issues/41Margo does not compile with recent Mercury changes2018-04-05T10:21:27-05:00Marc VefMargo does not compile with recent Mercury changesA recent [Mercury commit](https://github.com/mercury-hpc/mercury/commit/f43aa694e0d07a3288a8e2ee7067e0b3c28429ff) in branch master has made changes to `struct hg_info` and renamed the field `target_id` to `context_id`. As a result, Margo does not compile as the field `target_id` is used.
I suggest this [patch](/uploads/df2e27f8328d0468011fdba711902282/mercury_target_id_fix.patch) which solves this issue.A recent [Mercury commit](https://github.com/mercury-hpc/mercury/commit/f43aa694e0d07a3288a8e2ee7067e0b3c28429ff) in branch master has made changes to `struct hg_info` and renamed the field `target_id` to `context_id`. As a result, Margo does not compile as the field `target_id` is used.
I suggest this [patch](/uploads/df2e27f8328d0468011fdba711902282/mercury_target_id_fix.patch) which solves this issue.https://xgitlab.cels.anl.gov/sds/margo/issues/40High memory usage of Margo servers that use dedicated progress threads2018-02-26T04:08:59-06:00Marc VefHigh memory usage of Margo servers that use dedicated progress threadsI am currently investigating an issue where we noticed an unusually high memory consumption of the Margo server process. Essentially, if dedicated progress threads are used for the Margo server process, i.e. `margo_init()`'s `use_progress_thread` is true, each thread will require a maximum of ~500 MB of memory (though this upper bound seems to differ). The required memory increases to that maximum by simply handling incoming RPCs. Because the number of used Margo server threads seems to be directly connected to the memory usage, I suspect Margo or Argobots to be the cause.
We noticed this behavior on our clusters where we currently use a total of 32 threads and the server process consumed more than 14 GB of memory. The behavior can also be reproduced on a local desktop computer and it seems to be independent of the used Mercury NA layer (we checked CCI+verbs, bmi+tcp, and na+sm). Interestingly, if no dedicated thread is used with `margo_wait_for_finalize();` the memory footprint is only a few megabytes.
My Margo server and Margo client test applications are derived from the Margo examples (without the bulk transfer) in which the Margo client sends a large number of minimal RPCs (one int back and forth) in a loop to the server. On that note, I also noticed that the throughput of RPCs per second on a local machine increases by the factor of 3 if the progress thread of the Margo server runs in the caller's thread context compared to a single dedicated thread. Perhaps these two observations are connected.
Are there any solutions or explanations for these observations?
Thanks,
MarcI am currently investigating an issue where we noticed an unusually high memory consumption of the Margo server process. Essentially, if dedicated progress threads are used for the Margo server process, i.e. `margo_init()`'s `use_progress_thread` is true, each thread will require a maximum of ~500 MB of memory (though this upper bound seems to differ). The required memory increases to that maximum by simply handling incoming RPCs. Because the number of used Margo server threads seems to be directly connected to the memory usage, I suspect Margo or Argobots to be the cause.
We noticed this behavior on our clusters where we currently use a total of 32 threads and the server process consumed more than 14 GB of memory. The behavior can also be reproduced on a local desktop computer and it seems to be independent of the used Mercury NA layer (we checked CCI+verbs, bmi+tcp, and na+sm). Interestingly, if no dedicated thread is used with `margo_wait_for_finalize();` the memory footprint is only a few megabytes.
My Margo server and Margo client test applications are derived from the Margo examples (without the bulk transfer) in which the Margo client sends a large number of minimal RPCs (one int back and forth) in a loop to the server. On that note, I also noticed that the throughput of RPCs per second on a local machine increases by the factor of 3 if the progress thread of the Margo server runs in the caller's thread context compared to a single dedicated thread. Perhaps these two observations are connected.
Are there any solutions or explanations for these observations?
Thanks,
Marchttps://xgitlab.cels.anl.gov/sds/margo/issues/39Handle cache not working properly2018-04-05T10:33:56-05:00Matthieu DorierHandle cache not working properlyThis issue came up in testing sds-keyval. The scenario is as follows:
A client issues one "open" RPC, then X "put" RPCs, then on "shutdown" RPC. If X is 31 or greater, the "shutdown" RPC will cause the following error on the client side:
```
# HG -- Warning -- /home/mdorier/spack/var/spack/stage/mercury-master-nreeb4qqqsnhpfwmrl22o7shh7vnv2es/mercury/src/mercury_core_header.c:293
# hg_core_header_response_verify(): Response return code: HG_INVALID_PARAM
```
And in the server:
```
# HG -- Error -- /home/mdorier/spack/var/spack/stage/mercury-master-nreeb4qqqsnhpfwmrl22o7shh7vnv2es/mercury/src/mercury_core.c:2229
# hg_core_process(): Error while executing RPC callback
```
To try the code in sds-keyval, call the server as follows:
```
bin/sdskv-server-daemon -f addr.txt na+sm mydb:bwt
```
and the client as follows:
```
test/sdskv-put-test `cat addr.txt` 1 mydb 40
```This issue came up in testing sds-keyval. The scenario is as follows:
A client issues one "open" RPC, then X "put" RPCs, then on "shutdown" RPC. If X is 31 or greater, the "shutdown" RPC will cause the following error on the client side:
```
# HG -- Warning -- /home/mdorier/spack/var/spack/stage/mercury-master-nreeb4qqqsnhpfwmrl22o7shh7vnv2es/mercury/src/mercury_core_header.c:293
# hg_core_header_response_verify(): Response return code: HG_INVALID_PARAM
```
And in the server:
```
# HG -- Error -- /home/mdorier/spack/var/spack/stage/mercury-master-nreeb4qqqsnhpfwmrl22o7shh7vnv2es/mercury/src/mercury_core.c:2229
# hg_core_process(): Error while executing RPC callback
```
To try the code in sds-keyval, call the server as follows:
```
bin/sdskv-server-daemon -f addr.txt na+sm mydb:bwt
```
and the client as follows:
```
test/sdskv-put-test `cat addr.txt` 1 mydb 40
```Shane SnyderShane Snyderhttps://xgitlab.cels.anl.gov/sds/margo/issues/38shutdown RPC2018-01-18T07:24:19-06:00Matthieu Doriershutdown RPCImplement a __shutdown__ RPC internal to Margo so that a Margo instance can be finalized remotely.
The function to call to request a remote instance to shutdown should be called margo_shutdown_remote_instance.Implement a __shutdown__ RPC internal to Margo so that a Margo instance can be finalized remotely.
The function to call to request a remote instance to shutdown should be called margo_shutdown_remote_instance.Matthieu DorierMatthieu Dorier