Enhance instruction validation and model management #8

hawkeyexl · 2024-12-06T22:37:00Z

Improve instruction validation by modularizing model preparation, enhancing error reporting, and adding author guidance. Implement temporary directory management for model handling and refine markdown parsing for better content extraction.

Summary by CodeRabbit

Release Notes

New Features
- Enhanced documentation with new sections: "How to Use Our Amazing Product," "Prerequisites," "Setup," "Usage," "Advanced Features," "Troubleshooting," and "Next steps."
- Added instructions property in the schema for better content guidance.
- Introduced Llama model preloading functionality for improved performance.
- New template api-operation added with detailed sections for better API documentation.
Improvements
- Updated validation functions to support asynchronous operations, enhancing error handling and validation logic.
- Refined Markdown parsing to better accumulate raw content.
- Enhanced the structure of the "Sample" template with clearer guidance for content creators.
- Expanded usage instructions in the README for improved user guidance, including a new --json output flag.
Bug Fixes
- Resolved issues with previous synchronous calls in validation functions.
Documentation
- Improved clarity and usability of templates and documentation structures, including detailed explanations of template definitions and content validation rules.

…logic

…rting

…tions and integrate instruction validation

…and refining grammar validation. Update template instructions for clarity.

…on validation

coderabbitai · 2024-12-06T22:37:07Z

Walkthrough

This pull request introduces extensive modifications across several files, primarily focusing on enhancing the structure, content organization, and functionality of the documentation and validation processes. Key changes include the addition of new sections and properties in the Markdown JSON, updates to the package version and scripts, asynchronous handling in validation functions, and the introduction of new utility functions for managing temporary directories. These adjustments aim to improve the clarity, usability, and validation capabilities of the documentation framework.

Changes

File	Change Summary
artifacts/sample_markdown.json	Significant structural modifications, including new sections ("How to Use Our Amazing Product", "Prerequisites", "Setup", etc.), updated IDs, and enhanced content organization.
package.json	Updated version from `0.0.3` to `0.0.4`, added `postinstall` and `clean` scripts, updated dependency `@apidevtools/json-schema-ref-parser`, and added `node-llama-cpp`.
src/index.js	Updated `lintDocument` to use `await` for `validateStructure`, removed commented-out AsciiDoc checks, and retained error handling.
src/parsers/markdown.js	Enhanced `parseMarkdown` with new helper function `getNodeRawContent` for accumulating raw content from various node types.
src/rules/instructionValidator.js	Introduced functions for validating instructions against content using a Llama model, including `prepareModel` and `validateInstructions`.
src/rules/instructionValidator.test.js	New test suite for `validateInstructions` with three test cases covering various validation scenarios.
src/rules/structureValidator.js	Updated `validateStructure` and `validateSection` to be asynchronous, added instruction validation.
src/rules/structureValidator.test.js	Modified tests to handle asynchronous validation for `validateStructure` and `validateSection`.
src/rules/subsectionValidator.js	Updated `validateAdditionalSections`, `validateRequiredSubsections`, and `validateSubsections` to be asynchronous, improving error handling and logic.
src/rules/subsectionValidator.test.js	Made tests for `validateSubsections` asynchronous, updating all test cases accordingly.
src/schema.js	Added `instructions` property to the `section` definition in the schema for enhanced content guidance.
src/util/preloadModel.js	Introduced functionality to preload a Llama model based on environment configuration.
src/util/tempDir.js	New file for managing a temporary directory with functions `getTempDir` and `cleanTempDir`.
templates.yaml	Enhanced "Sample" template structure with a new `instructions` key under the "Introduction" section.

Possibly related PRs

Enhance template structure and sequence validation #2: The changes in the main PR involve significant modifications to the structure and content of a markdown JSON document, which aligns with the enhancements made to the markdown parser and the validation of sequences in the retrieved PR. Both PRs focus on improving the organization and validation of document structures.

🐰 "In the garden of code, we hop with delight,
New sections and scripts, oh what a sight!
With Llama to guide and temp files to keep,
Our documentation grows, no longer a heap.
So let’s bounce with joy, as we tidy our space,
For clarity reigns in this coding race!" 🥕

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Copilot

Copilot reviewed 7 out of 15 changed files in this pull request and generated 1 suggestion.

Files not reviewed (8)

package.json: Language not supported
src/rules/instructionValidator.js: Evaluated as low risk
src/rules/subsectionValidator.test.js: Evaluated as low risk
src/rules/structureValidator.test.js: Evaluated as low risk
src/index.js: Evaluated as low risk
src/rules/structureValidator.js: Evaluated as low risk
src/rules/subsectionValidator.js: Evaluated as low risk
templates.yaml: Evaluated as low risk

Comments skipped due to low confidence (2)

src/rules/instructionValidator.test.js:0

Add tests to cover cases where the llama instance or model preparation fails.

import { expect } from "chai";

src/util/tempDir.js:1

The shebang line is unnecessary for a module file and should be removed.

#!/usr/bin/env node

src/util/tempDir.js

coderabbitai

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (13)

src/rules/structureValidator.js (1)
31-31: Consider parallelizing section validations for improved performance

Currently, the validateSection calls within the loop are awaited sequentially, which may lead to performance bottlenecks if there are many sections to validate. Consider using Promise.all to run these validations in parallel.

Apply this diff to modify the loop:
-      for (let i = 0; i < Object.keys(template.sections).length; i++) {
-        const templateKey = Object.keys(template.sections)[i];
-        const templateSection = template.sections[templateKey];
-        const structureSection = structure.sections[i];

-        errors = errors.concat(
-          await validateSection(structureSection, templateSection)
-        );
-      }
+      const sectionPromises = Object.keys(template.sections).map(async (templateKey, i) => {
+        const templateSection = template.sections[templateKey];
+        const structureSection = structure.sections[i];
+        return validateSection(structureSection, templateSection);
+      });
+      const results = await Promise.all(sectionPromises);
+      results.forEach(result => {
+        errors = errors.concat(result);
+      });
src/rules/instructionValidator.js (1)

98-98: Address the TODO to silence terminal output when initializing Llama

The TODO comment indicates a need to figure out how to silence the terminal output from getLlama(). Resolving this will allow you to move the llama initialization to the top of the file, enhancing code organization.

Would you like assistance in implementing a solution to silence the terminal output?
src/rules/subsectionValidator.js (1)
Line range hint 79-98: Ensure proper handling of asynchronous calls in validateAdditionalSections

The function validateAdditionalSections is now asynchronous, and you're awaiting validateSection, which is good. However, consider refactoring to improve readability and maintainability. Also, ensure that all potential errors are properly handled.

Apply this diff to refactor the loop:
  async function validateAdditionalSections(structureSection, templateSection) {
    const errors = [];
    const sectionMap = {};

-   for (let j = 0; j < Object.keys(templateSection.sections).length; j++) {
-     const templateKey = Object.keys(templateSection.sections)[j];
+   const templateKeys = Object.keys(templateSection.sections);
+   for (const templateKey of templateKeys) {
      const templateSubsection = templateSection.sections[templateKey];

      let sectionFound = false;
      // Iterate through each subsection in the structure section
      for (let k = 0; k < structureSection.sections.length; k++) {
        // Validate the current subsection against the template subsection
        const structureSubsection = structureSection.sections[k];
        const sectionErrors = await validateSection(structureSubsection, templateSubsection);
        if (sectionErrors.length === 0) {
          // If the subsection is valid, add it to the section map
          if (!sectionMap[templateKey]) sectionMap[templateKey] = [];
          sectionMap[templateKey].push(k);
          sectionFound = true;
        }
      }

      if (!sectionFound && templateSubsection.required) {
        errors.push(
          new ValidationError(
            "missing_section",
            structureSection.heading?.content,
            `Missing section ${templateKey}`,
            structureSection.position
          )
        );
      }
    }

    return errors;
  }
src/rules/instructionValidator.test.js (2)
16-16: Consider reducing test timeouts and adding edge cases

The 20-second timeout seems excessive for unit tests. Consider:

Reducing the timeout to a more reasonable value (e.g., 5s)

Adding edge cases such as:

Empty or malformed instructions

Multiple instructions with mixed results

Unicode content validation

Also applies to: 31-31, 45-45

19-26: Enhance test data with realistic validation scenarios

The current test uses a simplified instruction that may not reflect real-world validation complexity.

Consider using more realistic test data:
 const section = { 
     rawContent: "Invalid content that will fail validation", 
     heading: { content: "Test heading" }, 
     position: { start: { line: 1, column: 1 } }
 };
 const template = { 
-    instructions: ["Content must mention all three primary colors"] 
+    instructions: [
+      "Content must include a code example",
+      "Must have at least one reference link",
+      "Should explain the purpose of the feature"
+    ] 
 };
src/rules/subsectionValidator.test.js (2)
28-31: Enhance error message validation

The test verifies the error instance but doesn't thoroughly validate the error message content.

Consider adding specific error message assertions:
 const result = await validateSubsections(structureSection, templateSection);
 expect(result).to.be.an("array").that.is.not.empty;
 expect(result[0]).to.be.instanceOf(ValidationError);
-expect(result[0].message).to.include("Expected between");
+expect(result[0].message).to.equal(
+  "Expected between 2 and 2 sections, but found 3 in section 'Heading' at line 1"
+);
34-40: Improve test data with meaningful section content

The test uses empty objects for sections, which doesn't represent real-world scenarios.

Consider using more realistic test data:
-const structureSection = { sections: [{}, {}, {}], heading: { content: "Heading" }, position: { line: 1, column: 1 } };
+const structureSection = {
+  sections: [
+    { heading: { content: "Installation" }, content: "Installation steps..." },
+    { heading: { content: "Usage" }, content: "Usage instructions..." },
+    { heading: { content: "API" }, content: "API documentation..." }
+  ],
+  heading: { content: "Getting Started" },
+  position: { line: 1, column: 1 }
+};
templates.yaml (1)
40-41: Consider expanding author instructions with examples

While the instruction "Mention the intent of the guide" is clear, providing an example of a good intent statement would help authors better understand the expectation.

Consider expanding the instructions like this:
 instructions:
   - Mention the intent of the guide
+  - Example: "This guide helps you set up continuous integration for your Python project"
src/index.js (1)
Line range hint 13-23: Clean up commented AsciiDoc-related code

Since AsciiDoc support is commented out, it should either be removed entirely or tracked as a TODO for future implementation.

Apply this diff to clean up the code:
const inferFileType = (filePath, content) => {
  const extension = path.extname(filePath).toLowerCase();
  if ([".md", ".markdown"].includes(extension)) {
    return "markdown";
-    // } else if ([".adoc", ".asciidoc"].includes(extension)) {
-    //   return "asciidoc";
  }

-  // If extension is not conclusive, check content
-  // if (content.trim().startsWith("= ")) {
-  //   return "asciidoc";
-  // }

  // Default to markdown if unable to determine
  return "markdown";
};
artifacts/sample_markdown.json (2)

441-441: Standardize line endings in rawContent

The rawContent strings contain inconsistent line endings. Some sections use \n while others have \r\n.

Standardize all line endings to use Unix-style \n for consistency.

Also applies to: 919-919, 1351-1351, 1720-1720, 2074-2074, 2430-2430

Line range hint 485-505: Clean up Prerequisites list items

The Prerequisites section contains some placeholder-like items (foo, bar, x, y) that should be replaced with meaningful requirements.

Replace placeholder items with actual prerequisites that users need to know about.
src/parsers/markdown.js (2)
70-70: Initialize rawContent properly.

The rawContent property is initialized in both processSection and createDefaultSection, but there's potential for undefined behavior.

Consider these improvements:
 const processSection = (node) => {
   const newSection = {
     id: uuid(),
     position: node.position,
     content: [],
-    rawContent: getNodeRawContent(node),
+    rawContent: getNodeRawContent(node) || '',
     // ... rest of the properties
   };
   return newSection;
 };

 const createDefaultSection = (node) => {
   return {
     id: uuid(),
     position: node.position,
     content: [],
-    rawContent: getNodeRawContent(node),
+    rawContent: getNodeRawContent(node) || '',
     // ... rest of the properties
   };
   return newSection;
 };
Also applies to: 135-135

Line range hint 1-229: Consider performance implications.

The current implementation rebuilds raw content strings frequently, which could impact performance with large documents.

Consider these optimizations:

Use a string buffer or array to accumulate content

Defer raw content generation until needed

Cache generated raw content

Would you like me to provide a detailed implementation for any of these optimizations?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between db69b6d and c0b8e0b.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (14)

artifacts/sample_markdown.json (112 hunks)
package.json (2 hunks)
src/index.js (1 hunks)
src/parsers/markdown.js (4 hunks)
src/rules/instructionValidator.js (1 hunks)
src/rules/instructionValidator.test.js (1 hunks)
src/rules/structureValidator.js (5 hunks)
src/rules/structureValidator.test.js (8 hunks)
src/rules/subsectionValidator.js (5 hunks)
src/rules/subsectionValidator.test.js (1 hunks)
src/schema.js (1 hunks)
src/util/preloadModel.js (1 hunks)
src/util/tempDir.js (1 hunks)
templates.yaml (1 hunks)

🧰 Additional context used

📓 Learnings (2)

src/rules/structureValidator.test.js (1)

Learnt from: hawkeyexl
PR: hawkeyexl/doc-structure-lint#3
File: src/rules/structureValidator.js:23-33
Timestamp: 2024-12-05T22:42:48.443Z
Learning: In the `validateStructure` function in `src/rules/structureValidator.js`, the sections are expected to be in the same order between `template.sections` and `structure.sections`.

src/rules/structureValidator.js (1)

Learnt from: hawkeyexl
PR: hawkeyexl/doc-structure-lint#3
File: src/rules/structureValidator.js:23-33
Timestamp: 2024-12-05T22:42:48.443Z
Learning: In the `validateStructure` function in `src/rules/structureValidator.js`, the sections are expected to be in the same order between `template.sections` and `structure.sections`.

🔇 Additional comments (13)

src/rules/structureValidator.js (5)

6-6: LGTM!

The import statement correctly brings in validateInstructions for instruction validation.

18-18: Function validateStructure correctly updated to be asynchronous

Changing validateStructure to an async function allows the use of await inside the function, which is necessary for awaiting asynchronous validation steps.

46-46: Function validateSection correctly updated to be asynchronous

Updating validateSection to be an async function allows it to perform asynchronous operations, such as awaiting the results of validateInstructions.

78-84: LGTM!

The addition of instruction validation is correctly integrated. The code checks if templateSection.instructions exist and, if so, awaits the result of validateInstructions, concatenating any errors returned.

88-88: Awaiting validateSubsections appropriately

The call to validateSubsections is correctly updated to use await, ensuring any asynchronous operations within are properly handled.

src/rules/instructionValidator.js (1)

123-123: ⚠️ Potential issue

Ensure the model is properly disposed to free resources

Consider disposing of the model instance after you are done using it, to prevent potential memory leaks or resource issues.

Apply this diff to dispose of the model:

  llama.dispose(); // Free up resources
+ await model.dispose();
  return errors;

Likely invalid or redundant comment.

src/rules/subsectionValidator.js (2)

125-132: Await asynchronous validation in validateRequiredSubsections

In validateRequiredSubsections, you're correctly awaiting validateSection, but ensure that any potential errors are properly caught and handled.

Line range hint 145-174: Consistent use of await in validateSubsections

The function validateSubsections correctly awaits the results of validateAdditionalSections and validateRequiredSubsections. Ensure that any errors from these functions are appropriately managed.

package.json (2)

3-3: LGTM: Version bump and new scripts align with feature additions

The version bump and new scripts (postinstall and clean) appropriately reflect the addition of model management features.

Also applies to: 10-11

23-23: Verify node-llama-cpp compatibility and requirements

The addition of node-llama-cpp is a significant dependency that may have system-level requirements.

Let's verify the dependency's requirements and compatibility:

✅ Verification successful

Node-llama-cpp integration appears well-implemented with proper resource management

The dependency is used appropriately with:

Proper model loading and cleanup through prepareModel
Resource management via dispose() calls
Environment variable control (DOC_STRUCTURE_LINT_PRELOAD)
Graceful model downloading and temporary directory handling
Clear error handling and validation logic

The implementation follows best practices and includes necessary safeguards. The package is used for its intended purpose of running LLM inference with proper resource management.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check package.json files in the repository for potential conflicts
rg -l "node-llama-cpp" 

# Check for system requirements in README or documentation
rg -i "requirement|prerequisite|depend" README.md

Length of output: 344

Script:

#!/bin/bash
# Check how node-llama-cpp is being used in the codebase
rg -A 5 "node-llama-cpp" src/

# Check for any setup or installation instructions in the repository
fd -t f -e md | xargs rg -i "setup|install|build"

# Look for any configuration files that might contain system requirements
fd -g "*config*" -t f | xargs cat

# Check the actual implementation files for any system-specific code
cat src/rules/instructionValidator.js
cat src/util/preloadModel.js

Length of output: 6549

src/index.js (1)

70-70: Verify async operation handling

The change to async operation for validateStructure suggests integration with a language model. Ensure proper error handling for potential model-related failures.

src/schema.js (1)

136-142: Verify language model integration

The schema indicates that instructions will be evaluated by a local language model. This is a significant architectural decision.

Consider the following aspects:

Model deployment and versioning strategy
Performance impact of local model evaluation
Fallback mechanisms for model failures

Run this script to check for model-related configurations:

✅ Verification successful

Let me check the content of the instruction validator to better understand the implementation.

Local language model integration is properly implemented

The implementation shows a well-structured local language model integration using node-llama-cpp with:

Proper model deployment using a quantized 3B Llama model (Q4_K_M)
Efficient resource management with temp directory cleanup and context disposal
Clear fallback handling through ValidationError
Structured response parsing with JSON grammar
Performance optimization through model reuse across multiple instruction validations

The architectural concerns mentioned in the review comment are already addressed in the implementation:

Model deployment: Uses HuggingFace hosted model with automatic downloading
Performance: Uses quantized model and efficient context management
Error handling: Implements proper error reporting and resource cleanup

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for model configuration files and integration points

# Search for model configuration files
fd -e yml -e yaml -e json | rg -i "model|llm|language"

# Search for model integration code
rg -l "model.*validate|validate.*model"

Length of output: 132

Script:

#!/bin/bash
# Check the implementation of instruction validation
cat src/rules/instructionValidator.js

Length of output: 4546

src/rules/structureValidator.test.js (1)

Line range hint 5-149: LGTM! Async/await changes are consistently applied.

The conversion of test cases to use async/await is thorough and correctly implemented across all test scenarios.

However, I notice that instruction validation tests are missing, which should be added to align with the PR objectives.

Let's check if instruction validation tests exist elsewhere:

src/rules/instructionValidator.js

src/util/preloadModel.js

src/util/tempDir.js

src/schema.js

artifacts/sample_markdown.json

src/parsers/markdown.js

… validator, and temporary directory management

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (6)

src/rules/instructionValidator.js (5)

22-28: Consider making model configuration more flexible.

The model name and path are currently hardcoded. Consider extracting these values to a configuration file for better maintainability and flexibility.

+import config from '../config/model.js';
+
 export async function prepareModel(llama) {
   try {
     const dir = getTempDir();
 
-    const expectedFiles = ["Llama-3.2-3B-Instruct-Q4_K_M.gguf"];
+    const expectedFiles = [config.modelFileName];
     cleanTempDir(expectedFiles);
 
     await resolveModelFile(
-      "hf:bartowski/Llama-3.2-3B-Instruct-GGUF/Llama-3.2-3B-Instruct-Q4_K_M.gguf",
-      { directory: dir, fileName: "Llama-3.2-3B-Instruct-Q4_K_M.gguf" }
+      config.modelPath,
+      { directory: dir, fileName: config.modelFileName }
     );

49-64: Enhance JSON schema validation.

Consider adding additional validation constraints to ensure better data quality:

   return await llama.createGrammarForJsonSchema({
     type: "object",
     required: ["assessment"],
     properties: {
       assessment: {
         description: "The result of the evaluation",
         type: "string",
         enum: ["pass", "fail"],
       },
       explanation: {
         description: "Suggestions for improvement, if any.",
         type: "string",
+        minLength: 1,
         maxLength: 500,
+        pattern: "^[A-Z].*[.]$"  // Ensure starts with capital letter and ends with period
       },
     },
   });

80-81: Enhance system prompt for more specific guidance.

The current system prompt could be more specific about evaluation criteria and expected output format.

     systemPrompt:
-      "You are a technical editor who evaluates instructions against sections of a document. Evaluate if the supplied content follows the specified instruction. Output your results as JSON.",
+      "You are a technical editor who evaluates if document sections follow specific instructions. Assess: 1) Does the content implement the instruction completely? 2) Is the implementation correct? 3) Are there any inconsistencies? Output your assessment as JSON with 'pass' or 'fail' and detailed explanation if failing.",

103-103: Address TODO comment regarding terminal output.

The TODO comment indicates a need to silence terminal output from the Llama model.

Would you like me to help implement a solution for silencing the terminal output or create an issue to track this task?

107-127: Consider batch processing for better performance.

Processing instructions sequentially might not be optimal for large documents. Consider implementing batch processing.

-  for (const index in template.instructions) {
-    const instruction = template.instructions[index];
-    const error = await validateInstruction(
-      model,
-      grammar,
-      instruction,
-      section.rawContent
-    );
+  const BATCH_SIZE = 5;
+  const instructions = template.instructions;
+  for (let i = 0; i < instructions.length; i += BATCH_SIZE) {
+    const batch = instructions.slice(i, i + BATCH_SIZE);
+    const batchErrors = await Promise.all(
+      batch.map(instruction =>
+        validateInstruction(model, grammar, instruction, section.rawContent)
+      )
+    );
+    batchErrors.forEach((error, index) => {
     if (error) {
       errors.push(
         new ValidationError(
           "instruction_error",
           section.heading.content,
-          `Instruction: ${instruction}${
-            !instruction.endsWith(".") ? "." : ""
+          `Instruction: ${batch[index]}${
+            !batch[index].endsWith(".") ? "." : ""
           } Explanation: ${error.explanation}`,
           section.position
         )
       );
     }
-  }
+    });
+  }

src/parsers/markdown.js (1)

224-237: Ensure consistent newline handling.

The content accumulation adds newlines consistently, but consider extracting the logic to a helper function to maintain consistency.

+  const appendRawContent = (section, node) => {
+    if (!section.rawContent) return;
+    const content = getNodeRawContent(node);
+    if (!content) return;
+    section.rawContent += section.rawContent.endsWith('\n') ? content : '\n' + content;
+  };
+
   // In processNode function:
-  currentSection.rawContent += '\n' + getNodeRawContent(node);
+  appendRawContent(currentSection, node);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between c0b8e0b and ec0258b.

📒 Files selected for processing (5)

src/parsers/markdown.js (4 hunks)
src/rules/instructionValidator.js (1 hunks)
src/schema.js (1 hunks)
src/util/preloadModel.js (1 hunks)
src/util/tempDir.js (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/schema.js
src/util/preloadModel.js
src/util/tempDir.js

🔇 Additional comments (1)

src/parsers/markdown.js (1)

148-176: LGTM! Well-implemented node content extraction.

The implementation handles edge cases well and follows the suggestions from previous reviews:

Proper null checks
Handles nested content in lists
Handles empty nodes
Uses optional chaining

src/rules/instructionValidator.js

…option

…template definitions and usage instructions

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

templates.yaml (2)
38-66: Consider requiring error examples

The API operation template is well-structured. However, consider making the error response section required to ensure comprehensive API documentation.
   error:
-    required: false
+    required: true
     heading:
       const: "Error Response"
120-128: Fix YAML formatting issues

The parameters component is well-structured, but there are formatting issues to address:

Add a newline at the end of file

Remove trailing spaces on line 128
   items:
-    min: 1
-      
+    min: 1
+
🧰 Tools

🪛 yamllint (1.35.1)

[error] 128-128: no new line character at the end of file

(new-line-at-end-of-file)

[error] 128-128: trailing spaces

(trailing-spaces)
README.md (1)

44-44: Consider adding JSON output example

While the --json option is documented, consider adding an example of the JSON output format to help users understand the structure.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ec0258b and 4cdabd4.

📒 Files selected for processing (2)

README.md (2 hunks)
templates.yaml (2 hunks)

🧰 Additional context used

🪛 yamllint (1.35.1)

templates.yaml

[error] 128-128: no new line character at the end of file

(new-line-at-end-of-file)

[error] 128-128: trailing spaces

(trailing-spaces)

🔇 Additional comments (4)

templates.yaml (2)

7-37: LGTM! Case consistency improvements

The conversion to lowercase keys and addition of instructions improves template consistency and validation capabilities.

71-72: LGTM! Clear author guidance

The addition of instructions provides clear guidance for authors to specify document intent.

README.md (2)

33-37: LGTM! Clear model management documentation

The documentation clearly explains model requirements, storage needs, and configuration options.

70-245: LGTM! Comprehensive template documentation

The documentation provides clear explanations of template structure, properties, and validation rules with matching examples.

hawkeyexl added 12 commits December 5, 2024 19:43

first pass custom rule testing

1cf7a5b

Handle model downloading and temp dir cleaning

6b8c4a0

Refactor tdwValidator to modularize model preparation and validation …

eacd99a

…logic

Enhance tdwValidator to improve instruction evaluation and error repo…

aff54b7

…rting

Renamed file

3972ab3

Add raw content extraction for markdown nodes in parseMarkdown function

e551d59

Updated object structure

4fa3642

Add instructions field to schema and templates for author guidance

7c1c1a7

Refactor structure validation functions to support asynchronous opera…

3577f60

…tions and integrate instruction validation

Enhance instruction validation by adding model preparation functions …

0dbd612

…and refining grammar validation. Update template instructions for clarity.

Add model preloading and temporary directory management for instructi…

7124e38

…on validation

Remove commented-out code from instruction validation module

c0b8e0b

hawkeyexl requested a review from Copilot December 6, 2024 22:37

Copilot AI reviewed Dec 6, 2024

View reviewed changes

src/util/tempDir.js Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 6, 2024

View reviewed changes

Improve error handling and validation in markdown parser, instruction…

ec0258b

… validator, and temporary directory management

coderabbitai bot reviewed Dec 6, 2024

View reviewed changes

src/rules/instructionValidator.js Show resolved Hide resolved

hawkeyexl added 2 commits December 6, 2024 15:19

Update README.md to clarify language model usage and add JSON output …

2d6b975

…option

Refactor README.md and templates.yaml for consistency and clarity in …

4cdabd4

…template definitions and usage instructions

hawkeyexl linked an issue Dec 6, 2024 that may be closed by this pull request

Evaluate tests for Test-Driven-Writing #5

Closed

coderabbitai bot reviewed Dec 6, 2024

View reviewed changes

hawkeyexl merged commit bd20826 into main Dec 6, 2024
2 checks passed

coderabbitai bot mentioned this pull request Dec 7, 2024

Add VS Code extension for real-time linting #10

Closed

hawkeyexl deleted the llm branch December 9, 2024 21:01

coderabbitai bot mentioned this pull request Dec 13, 2024

Let file-path and template-path arguments accept URLs #17

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance instruction validation and model management #8

Enhance instruction validation and model management #8

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Enhance instruction validation and model management #8

Enhance instruction validation and model management #8

Conversation

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CodeRabbit Configuration File (`.coderabbit.yaml`)