8000 Lnx refactor main by amirshehataornl · Pull Request #11078 · ofiwg/libfabric · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Lnx refactor main #11078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

amirshehataornl
Copy link
Contributor

Refactor LNX to allows for:

  • multiple domains per fabric
  • multiple endpoints per domain
  • multiple completion queues per domain
  • multiple address vectors per domain

Remove all extra functionality except for tagged messages. Other
functionality will be added in follow up patches.

Add support for fi_mr_regattr. This patch doesn't support memory
registration for RMA operations.

Add support for FI_CLAIM, FI_DISCARD and FI_PEEK on the receive path.

Beside the above changes. The PR addresses some bugs found in the lnx FI_PEER
path in the rxm and cxi providers

Support the FI_AV_USER_ID flag in the utility code.
Use the correct fi_addr in the table lookup

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Deadlock due to locking the rxc->ep_obj->lock in cxip_recv_req_init()
which can be called on a path where the lock is already held.
Move the locking to the callers of the function.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
When an unexpected message is received and there is valid cq_data to
extract from the message, extract this and assign it to the rx_entry.
Particularly the rx_entry->flags and rx_entry->cq_data

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Refactor LNX to allows for:
 - multiple domains per fabric
 - multiple endpoints per domain
 - multiple completion queues per domain
 - multiple address vectors per domain

Remove all extra functionality except for tagged messages. Other
functionality will be added in follow up patches.

Add support for fi_mr_regattr. This patch doesn't support memory
registration for RMA operations.

Add support for FI_CLAIM, FI_DISCARD and FI_PEEK on the receive path.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
Signed-off-by: Thomas Fillers <fillersjt@ornl.gov>
Signed-off-by: Thomas Naughton <naughtont@ornl.gov>
Signed-off-by: Brian Smith <smithbe2@ornl.gov>
Signed-off-by: Alexia Ingerson <alexia.ingerson@intel.com>
Added the ability to dump send statistics. This is important for the
multi-rail feature to be able to view how the different domains are
being used during the run time of the application.

Also keep stats on max and average length of the SRX receive and
unexpected queues. This will aid in characterizing the load on the system.

Signed-off-by: Amir Shehata <shehataa@ornl.gov>
@@ -4193,8 +4194,10 @@ cxip_recv_common(struct cxip_rxc *rxc, void *buf, size_t len, void *desc,

assert(rxc_hpc->base.protocol == FI_PROTO_CXI);

ofi_genlock_lock(&rxc->ep_obj->lock);
ret = cxip_recv_req_init(rxc, buf, len, src_addr, tag, ignore, flags,
tagged, context, comp_cntr, &req);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock is acquired again right after this.

Copy link
Contributor
@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

First pass of comments so we can get moving.
Overall looks good! We've done many internal passes of the code so we've done a lot of reviews already :)

this determination, the application requires the PCI information about the
interface. For this reason LNX forwards the PCI information for the
inter-node provider in the link to the application.
environment variable to the linked providers. The syntax used is:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this more detailed section and say "see Runtime Parameters for details" to not duplicate it in case we update it then it's only in one spot.

info->ep_attr->tx_ctx_cnt;
ep->lpe_fi_info->ep_attr->rx_ctx_cnt =
info->ep_attr->rx_ctx_cnt;
cep->cep_txc = calloc(cd->cd_info->ep_attr->tx_ctx_cnt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does lnx support scalable EPs and transmit/receive contexts? If you haven't tested it then I would remove it if it's not too invasive. We can add it later if that's a use case we can target. I'd like to think more about lnx + scalable EPs

Comment on lines 96 to 97
uint64_t cep_num_sends;
uint64_t cep_num_posted_recvs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily a blocker for now but I would suggest making the stats debug variables that are optimized out on non-debug builds

struct local_prov_ep *local_ep;
int addr_count;
fi_addr_t peer_addrs[LNX_MAX_LOCAL_EPS];
struct lnx_peer2core_map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer a simpler struct name that doesn't have a number in it. maybe lnx_peer_map?

rc = ofi_av_insert_addr(&lav->lav_av, &lp, &lp->lp_addr);
if (rc) {
ofi_genlock_unlock(&lav->lav_av.lock);
free(lp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also free lp->lp_avs

Comment on lines +263 to +266
rx_entry->rx_entry.flags,
rx_entry->rx_entry.msg_size, NULL,
rx_entry->rx_entry.cq_data,
rx_entry->rx_entry.tag);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a little off in the alignment

Comment on lines +293 to +296
rx_entry->rx_entry.flags,
rx_entry->rx_entry.msg_size, NULL,
rx_entry->rx_entry.cq_data,
rx_entry->rx_entry.tag);
Copy link
Contributor

Choose a reason for hiding this 8000 comment

The reason will be displayed to describe this comment to others. Learn more.

same here

if (!rx_entry) {
rc = -FI_ENOMEM;
goto out;
}
rx_entry->rx_peer = lp;
if (iov)
rx_entry->rx_entry.msg_size = ofi_total_iov_len(iov, count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary since the msg size refers to the total size of the incoming message (provided by the peer), not the total available size of the posted receive.

if (rc)
return rc;

FI_DBG(&lnx_prov, FI_LOG_CORE,
"sending to %lx tag %lx buf %p len %ld\n",
core_addr, tag, buf, len);

rc = fi_tsenddata(cep->lpe_ep, buf, len, mem_desc,
data, core_addr, tag, context);
/* do memory registration with the core domain */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these comments are necessary since the function is pretty self explanatory.

Comment on lines 695 to +698
lnx_rma_read(struct fid_ep *ep, void *buf, size_t len, void *desc,
fi_addr_t src_addr, uint64_t addr, uint64_t key, void *context)
{
int rc;
struct lnx_ep *lep;
struct lnx_peer *lp;
struct fid_ep *core_ep;
struct lnx_ctx *ctx;
struct local_prov_ep *cep;
fi_addr_t core_addr;
struct lnx_peer_table *peer_tbl;
void *mem_desc;
uint64_t rkey;
struct ofi_mr_entry *mre = NULL;
struct iovec iov = {.iov_base = (void*)buf, .iov_len = len};

lep = lnx_get_lep(ep, &ctx);
if (!lep)
return -FI_ENOSYS;

peer_tbl = lep->le_peer_tbl;

lp = lnx_av_lookup_addr(peer_tbl, src_addr);
rc = lnx_select_send_pathway(lp, lep->le_domain, desc, &cep,
&core_addr, &iov, 1, &mre, &mem_desc, &rkey);
if (rc)
goto out;

FI_DBG(&lnx_prov, FI_LOG_CORE,
"rma read from %lx key %lx buf %p len %ld\n",
core_addr, key, buf, len);

core_ep = lnx_get_core_ep(cep, ctx->ctx_idx, ep->fid.fclass);

rc = fi_read(core_ep, buf, len, mem_desc,
core_addr, addr, key, context);

if (mre)
ofi_mr_cache_delete(&lep->le_domain->ld_mr_cache, mre);
out:
return rc;
return -FI_ENOSYS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove function and set read op to fi_no_read or whatever it's called

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0