8000 Migrate json parser to nlohmann_json by fwinkl · Pull Request #149 · lwtnn/lwtnn · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Migrate json parser to nlohmann_json #149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fwinkl
Copy link
@fwinkl fwinkl commented Aug 11, 2021

Migrate the json parser from boost to nlohmann_json, which provides
a better JSON API, is signifcantly faster and results in much smaller
binaries (up to 30x smaller).

In particular, for an ATLAS electron DNN:

  • 4x faster reading of the JSON
  • mallocs reduced from ~5M to 2k
  • 10 times smaller resident memory size

If this is a welcome change, this PR needs a bit more work, i.e. to validate the changes on all possible network configurations. I would need some advice on how to do that (the unit test coverage seems not very good so far, test-GRU.sh passes but it does not seem to test all code paths).

@@ -1,6 +1,6 @@

# Set range of valid CMake versions to build the project.
cmake_minimum_required(VERSION 3.1...3.20)
cmake_minimum_required(VERSION 3.11...3.20)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bumped the minimum to 3.11 in order to use FetchContent below. But that can certainly be changed if you want to keep the minimum at 3.1.

@@ -63,7 +63,7 @@ namespace lwt {
struct NodeConfig
{
enum class Type {
INPUT, INPUT_SEQUENCE, FEED_FORWARD, CONCATENATE, SEQUENCE,
NONE, INPUT, INPUT_SEQUENCE, FEED_FORWARD, CONCATENATE, SEQUENCE,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added NONE here as the nlohmann parser falls back to the first enum element in case of an unknown string. All other enums already had NONE as their first element, so this should hopefully be a safe change (I assume nobody uses the bare int values anywhere).

for (const auto& sub: j.at("sublayers")) {
lwt::EmbeddingConfig emb;
sub.at("weights").get_to(emb.weights);
//emb.index = sub.second.get<int>("index"); FIXME (need an example NN file)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not able to test this part of the code yet. Need a NN example.

break;
case Architecture::MAXOUT:
add_maxout_info(layer, j);
layer.activation.function = Activation::NONE; // FIXME: to make throw_if_not_maxout() happy
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code set function to NONE but the NN file I tested with actually contains "linear":

    {
      "activation": "linear",
      "architecture": "maxout",
      "sublayers": [

Could we just leave the value on LINEAR and get rid of this check?

@dguest
Copy link
Collaborator
dguest commented Aug 17, 2021

I really like this change (was never a huge fan of boost::property_tree), but I'm thinking it would be something like a major version bump given how much we've emphasized minimal dependencies and that this is swapping one of only two dependencies.

I'm also mostly offline until September, is this something that someone in ATLAS wants urgently?

@fwinkl
Copy link
Author
fwinkl commented Aug 18, 2021

Yes, this can (and probably should) wait for a major version bump. In the meantime I can work on some of the remaining issues. But some instructions on how to more extensively test this would be needed at some point.

@dguest
Copy link
Collaborator
dguest commented Aug 18, 2021

I had to approve the pipelines, but workflows/ci.yml hasn't been modified to install nlohmann, which means all the ci jobs can't find the json parser. We should be able to add nlohmann where libboost-dev is now (and remove boost, since we only used it for ptree).

8000

Migrate the json parser from `boost` to `nlohmann_json`, which provides
a better JSON API, is signifcantly faster and results in much smaller
binaries (up to 30x smaller).

In particular, for an ATLAS electron DNN:
- 4x faster reading of the JSON
- `malloc`s reduced from ~5M to 2k
- 10 times smaller resident memory size
@fwinkl
Copy link
Author
fwinkl commented Sep 20, 2021

I had to approve the pipelines, but workflows/ci.yml hasn't been modified to install nlohmann, which means all the ci jobs can't find the json parser. We should be able to add nlohmann where libboost-dev is now (and remove boost, since we only used it for ptree).

I tried that but the nlohmann-json-dev on bionic is hopelessly outdated. Any objections to just get nlohmann from github also in the CI?

@dguest
Copy link
Collaborator
dguest commented Sep 22, 2021

Hi @fwinkl, pulling from github sounds fine to me.

I'm a bit worried about relying on any bleeding-edge dependencies, but it's reasonable to think nlohmann has come a long way since 2018. Since we're going to bump the major release anyway I'm just wondering if we should also clean a few other things up...

@dguest dguest mentioned this pull request Sep 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0