DYN-5975 Indexing Package Nodes. #14120

RobertGlobant20 · 2023-06-29T18:14:34Z

Purpose

Indexing installed package nodes using LuceneNET
This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.

Declarations

Check these if you believe they are true

The codebase is in a better state after this PR
Is documented according to the standards
The level of testing this PR includes is appropriate
User facing strings, if any, are extracted into *.resx files
All tests pass using the self-service CI.
Snapshot of UI changes, if any.
Changes to the API follow Semantic Versioning and are documented in the API Changes document.
This PR modifies some build requirements and the readme is updated
This PR contains no files larger than 50 MB

Release Notes

Indexing installed package nodes using LuceneNET

Reviewers

@QilongTang @reddyashish

FYIs

This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.

RobertGlobant20 · 2023-06-29T18:15:28Z

GIF showing that after installing a package the nodes are available in InCanvasSearch.

RobertGlobant20 · 2023-06-29T18:16:10Z

GIF showing that when starting Dynamo all the node package are loaded and shown in the InCanvasSearch.

src/DynamoCore/Models/DynamoModel.cs

QilongTang

LGTM but I request some re-arrange of code

Code Review changes related to create methods for commit the changes and dispose the Lucene writer, so we can reuse the same code in several places.

src/DynamoCore/Utilities/LuceneSearchUtility.cs

QilongTang

LGTM with one comment

reddyashish · 2023-06-30T00:13:21Z

src/DynamoCore/Models/DynamoModel.cs

@@ -1416,6 +1419,16 @@ private void InitializeCustomNodeManager()
                customNodeSearchRegistry.Add(info.FunctionId);
                var searchElement = new CustomNodeSearchElement(CustomNodeManager, info);
                SearchModel.Add(searchElement);
+
+                //Indexing node packages installed using PackageManagerSearch
+                var iDoc = LuceneSearchUtility.InitializeIndexDocumentForNodes();


After committing these changes, will it overwrite the old index files or create new ones?

This will create new ones.
The only way to delete/update indexed files (Documents) is by using the functions:

writer.updateDocument() writer.deleteDocument()

But this is not the case.

Why dont we use writer.updateDocument()?

@QilongTang
In this case we are adding new nodes added when installing a package, so right now we can say that we are representing each node as one document, then just supposing that if in total we have 800 out of the box nodes we will have 800 documents, then if a package is installed more nodes will be added so more documents needs to be added (using writer.AddDocument()).
I think writer.updateDocument() will be needed in case any node is renamed or change it's category (only in Dynamo execution time - I think this case is not possible right now), also for packages when a package info is updated, but again this only should be used during dynamo execution time (also I think this is not possible now).
Please let me know if I explain myself correclty othewise we can create a more detailed explanation of this.
Thanks

Thank you @RobertGlobant20 good explanation. I am not sure if I confused myself, calling of the InitializeIndexDocumentForNodes() will recreate the docs in memory and then we append new nodes and then commit, so the index files are overwritten, correct? I guess previously I imagined something like reading the written index files on disk and append, then re-write. Do you think that is not possible, I worry about the initialization performance cost only

@QilongTang well, right now we are not using the files created in the index directory:
C:\Users\tellro\AppData\Roaming\Dynamo\Dynamo Core\2.19\Index
Lucene is just using them for searching purposes (use this files for fetch data quickly), if we want to use this files (as it should be) for reading the information indexed previously (and not indexing the info again) that's a whole different story, surely we will need a Jira task for implementing this changes.
But consider that we will face several challenges: if the index directory already exists, we will need to iterate all the documents (around 800) and check if something changed for each node (field: category, description, keywords, etc...) so depending of the analysis (if nodes were added, deleted or updated) we should execute the instructions:

writer.addDocument()

writer.updateDocument()

writer.deleteDocument()

The same for package and node packages, if new packages were installed we will continue to use writer.addDocument() but if the package was deleted manually then we should execute writer.deleteDocument().

I just tested the behavior in a external test app, I ran the app and indexed the info, then closed the app the re-open it so I was able to read all the info (previously indexed) using DirectoryReader.Open(index) and the results.

Also I think we should explore the functionality of using ByteBuffersDirectory(RAMDirectory) instead of using FSDirectory (File System Directory), we are using just 800 nodes with 4 or 5 fields that is very few information (the index directory weight is 688 KB ) so I don't think it will impact too much the RAM memory performance.

Sorry, I was not aware of this functionality, please let me know what you think about it. Thanks

Sure, we can make another task for it, what you described would be the way I preferred

Fixing comment

QilongTang · 2023-06-30T20:45:50Z

@RobertGlobant20 Please check if the reported regressions are real then merge

Testing for Fixing Regressions

RobertGlobant20 · 2023-07-04T01:04:53Z

@RobertGlobant20 Please check if the reported regressions are real then merge

@QilongTang seems that is a problem with disposing the writer at DynamoModel level (in DynamoModel.Dispose() method) but if is disposed after indexing the info (as previously was done) then we cannot index more info unless we create the writer again, I will check in detail just to see if there is another solution.

Some tests are failing in the parallels job due that DynamoModel.IsTestMode = false when running some tests. Then I'm catching the exception and setting the writer to null, as a consequence for this tests no info will be indexed and the search won't be returning anything.

RobertGlobant20 · 2023-07-04T20:21:51Z

@RobertGlobant20 Please check if the reported regressions are real then merge

@QilongTang seems that is a problem with disposing the writer at DynamoModel level (in DynamoModel.Dispose() method) but if is disposed after indexing the info (as previously was done) then we cannot index more info unless we create the writer again, I will check in detail just to see if there is another solution.

@QilongTang I've made a fix that will be preventing that parallel tests send exceptions when trying to create the IndexWriter (after this fix all the tests are passing), basically the fix was adding a try-catch statement when creating the IndexWriter and if the catch section is reached we dispose the writer and set the writer to null.
So with this fix all the parallel tests that set the DynamoModel.IsTestMode = false (when running tests) will not have the nodes indexed info and the search results will not retrieve the results.
Please let me know if this fix sounds good to you or if we should explore another solution. Thanks

QilongTang · 2023-07-05T04:46:31Z

hi @RobertGlobant20 I think this is fine for now

DYN-5975 Indexing Package Nodes.

6c9fe0f

This fix is adding the packages nodes already installed or if during Dynamo execution the user install more packages, the nodes will be also indexed so they can be shown as results in the InCanvasSearch functionality.

RobertGlobant20 requested review from QilongTang and reddyashish June 29, 2023 18:14

QilongTang added this to the 2.19.0 milestone Jun 29, 2023