-
Notifications
You must be signed in to change notification settings - Fork 415
Lnx refactor main #11078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Lnx refactor main #11078
Conversation
Support the FI_AV_USER_ID flag in the utility code. Use the correct fi_addr in the table lookup Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Deadlock due to locking the rxc->ep_obj->lock in cxip_recv_req_init() which can be called on a path where the lock is already held. Move the locking to the callers of the function. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
When an unexpected message is received and there is valid cq_data to extract from the message, extract this and assign it to the rx_entry. Particularly the rx_entry->flags and rx_entry->cq_data Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Refactor LNX to allows for: - multiple domains per fabric - multiple endpoints per domain - multiple completion queues per domain - multiple address vectors per domain Remove all extra functionality except for tagged messages. Other functionality will be added in follow up patches. Add support for fi_mr_regattr. This patch doesn't support memory registration for RMA operations. Add support for FI_CLAIM, FI_DISCARD and FI_PEEK on the receive path. Signed-off-by: Amir Shehata <shehataa@ornl.gov> Signed-off-by: Thomas Fillers <fillersjt@ornl.gov> Signed-off-by: Thomas Naughton <naughtont@ornl.gov> Signed-off-by: Brian Smith <smithbe2@ornl.gov> Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Added the ability to dump send statistics. This is important for the multi-rail feature to be able to view how the different domains are being used during the run time of the application. Also keep stats on max and average length of the SRX receive and unexpected queues. This will aid in characterizing the load on the system. Signed-off-by: Amir Shehata <shehataa@ornl.gov>
c8b2614
to
3215849
Compare
@@ -4193,8 +4194,10 @@ cxip_recv_common(struct cxip_rxc *rxc, void *buf, size_t len, void *desc, | |||
|
|||
assert(rxc_hpc->base.protocol == FI_PROTO_CXI); | |||
|
|||
ofi_genlock_lock(&rxc->ep_obj->lock); | |||
ret = cxip_recv_req_init(rxc, buf, len, src_addr, tag, ignore, flags, | |||
tagged, context, comp_cntr, &req); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock is acquired again right after this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8000First pass of comments so we can get moving.
Overall looks good! We've done many internal passes of the code so we've done a lot of reviews already :)
this determination, the application requires the PCI information about the | ||
interface. For this reason LNX forwards the PCI information for the | ||
inter-node provider in the link to the application. | ||
environment variable to the linked providers. The syntax used is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this more detailed section and say "see Runtime Parameters for details" to not duplicate it in case we update it then it's only in one spot.
info->ep_attr->tx_ctx_cnt; | ||
ep->lpe_fi_info->ep_attr->rx_ctx_cnt = | ||
info->ep_attr->rx_ctx_cnt; | ||
cep->cep_txc = calloc(cd->cd_info->ep_attr->tx_ctx_cnt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does lnx support scalable EPs and transmit/receive contexts? If you haven't tested it then I would remove it if it's not too invasive. We can add it later if that's a use case we can target. I'd like to think more about lnx + scalable EPs
prov/lnx/include/lnx.h
Outdated
uint64_t cep_num_sends; | ||
uint64_t cep_num_posted_recvs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a blocker for now but I would suggest making the stats debug variables that are optimized out on non-debug builds
struct local_prov_ep *local_ep; | ||
int addr_count; | ||
fi_addr_t peer_addrs[LNX_MAX_LOCAL_EPS]; | ||
struct lnx_peer2core_map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer a simpler struct name that doesn't have a number in it. maybe lnx_peer_map?
rc = ofi_av_insert_addr(&lav->lav_av, &lp, &lp->lp_addr); | ||
if (rc) { | ||
ofi_genlock_unlock(&lav->lav_av.lock); | ||
free(lp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also free lp->lp_avs
rx_entry->rx_entry.flags, | ||
rx_entry->rx_entry.msg_size, NULL, | ||
rx_entry->rx_entry.cq_data, | ||
rx_entry->rx_entry.tag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a little off in the alignment
rx_entry->rx_entry.flags, | ||
rx_entry->rx_entry.msg_size, NULL, | ||
rx_entry->rx_entry.cq_data, | ||
rx_entry->rx_entry.tag); |
There was a problem hiding this comment.
Choose a reason for hiding this 8000 comment
The reason will be displayed to describe this comment to others. Learn more.
same here
if (!rx_entry) { | ||
rc = -FI_ENOMEM; | ||
goto out; | ||
} | ||
rx_entry->rx_peer = lp; | ||
if (iov) | ||
rx_entry->rx_entry.msg_size = ofi_total_iov_len(iov, count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary since the msg size refers to the total size of the incoming message (provided by the peer), not the total available size of the posted receive.
if (rc) | ||
return rc; | ||
|
||
FI_DBG(&lnx_prov, FI_LOG_CORE, | ||
"sending to %lx tag %lx buf %p len %ld\n", | ||
core_addr, tag, buf, len); | ||
|
||
rc = fi_tsenddata(cep->lpe_ep, buf, len, mem_desc, | ||
data, core_addr, tag, context); | ||
/* do memory registration with the core domain */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of these comments are necessary since the function is pretty self explanatory.
lnx_rma_read(struct fid_ep *ep, void *buf, size_t len, void *desc, | ||
fi_addr_t src_addr, uint64_t addr, uint64_t key, void *context) | ||
{ | ||
int rc; | ||
struct lnx_ep *lep; | ||
struct lnx_peer *lp; | ||
struct fid_ep *core_ep; | ||
struct lnx_ctx *ctx; | ||
struct local_prov_ep *cep; | ||
fi_addr_t core_addr; | ||
struct lnx_peer_table *peer_tbl; | ||
void *mem_desc; | ||
uint64_t rkey; | ||
struct ofi_mr_entry *mre = NULL; | ||
struct iovec iov = {.iov_base = (void*)buf, .iov_len = len}; | ||
|
||
lep = lnx_get_lep(ep, &ctx); | ||
if (!lep) | ||
return -FI_ENOSYS; | ||
|
||
peer_tbl = lep->le_peer_tbl; | ||
|
||
lp = lnx_av_lookup_addr(peer_tbl, src_addr); | ||
rc = lnx_select_send_pathway(lp, lep->le_domain, desc, &cep, | ||
&core_addr, &iov, 1, &mre, &mem_desc, &rkey); | ||
if (rc) | ||
goto out; | ||
|
||
FI_DBG(&lnx_prov, FI_LOG_CORE, | ||
"rma read from %lx key %lx buf %p len %ld\n", | ||
core_addr, key, buf, len); | ||
|
||
core_ep = lnx_get_core_ep(cep, ctx->ctx_idx, ep->fid.fclass); | ||
|
||
rc = fi_read(core_ep, buf, len, mem_desc, | ||
core_addr, addr, key, context); | ||
|
||
if (mre) | ||
ofi_mr_cache_delete(&lep->le_domain->ld_mr_cache, mre); | ||
out: | ||
return rc; | ||
return -FI_ENOSYS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove function and set read op to fi_no_read or whatever it's called
Refactor LNX to allows for:
Remove all extra functionality except for tagged messages. Other
functionality will be added in follow up patches.
Add support for fi_mr_regattr. This patch doesn't support memory
registration for RMA operations.
Add support for FI_CLAIM, FI_DISCARD and FI_PEEK on the receive path.
Beside the above changes. The PR addresses some bugs found in the lnx FI_PEER
path in the rxm and cxi providers