8000 Merge Igalia/lwaftr upstream by wingo · Pull Request #1133 · snabbco/snabb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Merge Igalia/lwaftr upstream #1133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 778 commits into from
Jun 23, 2017
Merged

Merge Igalia/lwaftr upstream #1133

merged 778 commits into from
Jun 23, 2017

Conversation

wingo
Copy link
Contributor
@wingo wingo commented Apr 24, 2017

Hello kind Snabb hackers!

This is a PR that is manifestly too large, however it's what I got. It merges hacking on the lwaftr to master, which hasn't been done for a while now (6 months or more). Notably it merges in the YANG support. Specific changes:

  • Add support for YANG schemas, configuration, and state data, allowing users to configure Snabb programs with pleasant textual schema-constrained configuration formats and have Snabb automatically translate that configuration into efficient native Lua data that is usable directly by a data plane. See the lib.yang documentation for full details.

  • New lwAFTR configuration format based on YANG. To migrate old configurations, run snabb lwaftr migrate-configuration old.conf on the old configuration. See the snabb-softwire-v1.yang schema or configuration.md for full details on the new configuration format.

  • Add ability to reconfigure an entire lwAFTR process while it is running, including changes that can add or remove ingresss or egress filters, change NIC settings, or the like. Pass the --reconfigurable argument to snabb lwaftr run, then interact with the lwAFTR instance via snabb config. This --reconfigurable facility is also available for other Snabb programs to use. See the apps.config documentation

  • Add snabb config set of commands, to replace snabb lwaftr control. These applications can query state or configuration of a running Snabb instance as well as update that configuration in an efficient way. See the snabb config documentation for full details.

  • Add support for named Snabb instances (and specifically named lwAFTR instances). Pass --name foo to the snabb lwaftr run command to have it claim a name on a machine. snabb config can identify the remote Snabb instance by name, which is often much more convenient than using the instance's PID. The snabb lwaftr query, snabb lwaftr monitor, snabbvmx query, snabbvmx top, and snabb top tools can locate Snabb instances by name, and snabb ps can show all Snabb processes with their names.

  • Add support to the lwAFTR for fast binding-table updates to individual binding-table entries without causing the whole table to reload, via generic snabb config commands like snabb config add /softwire-config/binding-table/softwire. See snabb config documentation for more on how to use snabb config add and snabb config remove. This facility is also generic and can be used by other Snabb data planes.

  • Add support to lwAFTR for ietf-softwire alternate YANG schema. Pass the -s ietf-softwire to snabb config invocations to use this schema. Data planes expose a primary YANG model but there is a generic facility to allow the data plane to define transformations from other models to the primary model.

  • Adapt snabbvmx to configuration format changes.

  • Incorporate all PodHashMap functionality into lib.ctable and switch the lwaftr over to use lib.ctable, removing PodHashMap.

  • Incorporate multi-process work from @lukego in Engine support for multiprocess orchestration #1021.

Review is very welcome. My personal opinion is that all of this is "safe" and good to merge with the exception of the changes to src/core/app.lua: master...Igalia:lwaftr-2017-04-24#diff-e2bf611b7cf61fb539b42bf778b8fc9c. These changes were designed and implemented here: #1067. I think that's the hard review. I would suggest that we do perf testing on this merge and see if the regression found on that PR is still present in this one, and just see what happens.

wingo and others added 30 commits December 9, 2016 15:06
This change makes the follower apply some pacing so that it doesn't
stall packet processing.  Also, the follower now waits for a "commit"
action to actually apply its changes.  The code that creates actions has
been changed to add "commit" actions where appropriate.
Removed README. prefix from docs
Before, each ctable creation would result in a fresh struct allocation.
This would eventually cause LuaJIT's 65536-entry ctype table to overflow.
This lowers the effective rate that the follower processes messages
to Hz*4, which is really Hz*2 if you consider commit messages.  The
goal was to reduce packet drop.
Wait for complete change sets before committing in follower
Print out PID name, if defined, on a second column.
Remove useless pcap files, clarify docs and code.
Hard word wrap at 80 chars for filter perf doc
Sketch out Igalia's testing strategy
Minor improvements to snabb config listen
Print out worker info in 'snabb ps'
Add basic error reporting to snabb-softwire-v1
- Get rid of `screen` command.
- Connect to VM via telnet instead of ssh.
- Change syntax to "verb command".
- Remove "lwaftr" command.
- Enable extra tap network interface to have external internet access
  from the guest.
Also run selftest in dasl files
@dpino
Copy link
Contributor
dpino commented May 2, 2017

@wingo: OK, sounds good to me. I overlooked the results of lib_pmu tests, although I still got the error I pasted above if I run the test in my laptop (4.X kernel, AVX2 support). Since CI passes that's all we should care about.

I'd look deeper into the failing SnabbVMX test (program.snabbvmx.tests.selftest.sh). It could be the case that a Linux tool is missing in the VM. That would be an easy fix. But if that's not the cause, fixing the test could take longer. I'm OK with disabling it temporarily.

To disable it, I believe the easiest thing to do is to rename the test from "selftest.sh" to "selftest.disabled.sh", for example. This way the test won't be run on "make test".

wingo and others added 3 commits May 2, 2017 16:51
The snabbvmx selftest currently passes only when run on a system with a
3.x kernel on the host.  We will fix this but in the interests of
getting the lwaftr in a mergeable state, temporarily disabling this test
by renaming it.
@wingo
Copy link
Contributor Author
wingo commented May 3, 2017

@eugeneia the snabb-bot run has been going overnight, since around 18h yesterday. Is that normal?

@eugeneia
Copy link
Member
eugeneia commented May 3, 2017

@wingo No it died :-/ Still working on sorting out davos.

@lukego
Copy link
Member
lukego commented May 3, 2017

Hydra seems happy! Looks like this branch has improved some configurations/setups of the iperf benchmark for snabbnfv. Could be relatively obscure cases (e.g. running a patched QEMU with a larger vring) but always nice to move forward anyway.

@lukego
Copy link
Member
lukego commented May 4, 2017

@wingo Do you want to do more work on this branch before merge?

@wingo
Copy link
Contributor Author
wingo commented May 10, 2017

@lukego Sorry for the delay; I was on a short holiday from when you wrote. I would like to fix snabb-bot before merging but I have no other functional changes queued. I think the snabb-bot failures are related to configuration rather than to functionality, fwiw.

@wingo
Copy link
Contributor Author
wingo commented May 10, 2017

Actually lookin at things again, I think the intel_mp test failure is spurious. We certainly didn't change that code. @eugeneia would you mind running the bot again on this PR?

@wingo
Copy link
Contributor Author
wingo commented May 11, 2017

Just pushed a trivial 8000 commit to force the bot to run again.

@eugeneia
Copy link
Member

My guess is that the intel_mp selftest is somehow affected by the changes to core, given that it seems reproducible. Sorry for the SnabbBot delays, seems like we hit some new docker related bugs since I redeployed davos. Quite the pandora’s box I opened there…

@wingo
Copy link
Contributor Author
wingo commented May 16, 2017

Well! That was horrible. I spent days ignorantly trying to understand things from a constructive perspective and then gave up and did a bisect. It's wasn't quite a real bisect as the intel_mp driver only came in with v2017.04, but at each step between master and lwaftr I merged in #1086 and ran a test that just did two testup.snabb invocations on different queues on the same device. If both complete with true, then great, otherwise one would hang and that would be the result. Apparently the culprit is 9fe91ac from the multi-process worker-and-supervisor code, and indeed it makes sense; the intel_mp driver expects bus mastering to not be tied to any particular process, I think. Gotta knock off for today but will poke tomorrow. /cc @lukego and @petebristow who will find this interesting I think.

wingo and others added 2 commits May 17, 2017 09:21

-- Run the engine with continuous configuration updates
local current_config
local child_path = "group/config/..name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the " in the wrong place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, it certainly looks like it. Perhaps given that Luke is proposing #1021 separately, we comment there?

@wingo
Copy link
Contributor Author
wingo commented May 18, 2017

The bot ran, yay! From what I can tell, the failure noted in the bot run was a perf regression:

ERROR snabbnfv-iperf-1500-crypto -> 0.805118 of 1.016 (SD: 0.102489 )

However a previous run on this branch before this unrelated bus-mastering patch succeeded with 1.0, so I am inclined to think, no problem.

@wingo
Copy link
Contributor Author
wingo commented May 19, 2017

This one's all ready to go, I think! WDYT the next step is @lukego? Do you want to do #1021 first and separately? Should I just merge this to wingo-next ?

lukego added a commit to lukego/snabb that referenced this pull request May 22, 2017
@lukego
Copy link
Member
lukego commented May 22, 2017

Great hacking @wingo! This is merged onto #1148 with next hop next.

B6DC @lukego lukego added the merged label May 22, 2017
@wingo
Copy link
Contributor Author
wingo commented May 23, 2017

Thank you for the review and the merge, yay!!!!

@eugeneia eugeneia merged commit 5e72f50 into snabbco:master Jun 23, 2017
@wingo wingo deleted the lwaftr-2017-04-24 branch July 31, 2017 12:36
dpino added a commit to dpino/snabb that referenced this pull request Sep 10, 2018
Leader process apparently running on wrong NUMA node
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0