8000 GitHub - unit-mesh/unit-gen at v0.3.3
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.

License

Notifications You must be signed in to change notification settings

unit-mesh/unit-gen

Repository files navigation

Unit Eval - Data Framework

CI/CD Powered By Maven Open In OpenBayes Built with OpenBayes codecov

Unit Eval is a code fine-tuning data framework that generates data from your existing codebase. Additionally, it includes benchmarking and evaluation tools for code LLM.

Docs: https://eval.unitmesh.cc/

Thanks to OpenBayes for providing computing resources.

Examples:

name model download (HuggingFace) finetune Notebook jupyter notebook Log model download (OpenBayes)
DeepSeek 6.7B unit-mesh/autodev-deepseek-6.7b-finetunes finetune.ipynb OpenBayes deepseek-coder-6.7b-instruct-finetune-100steps
CodeGeeX2 6B TODO TODO TODO

Features:

Design Philosophy

  • Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.
  • Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.
  • Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.

Unique Prompt

Unit Eval Overview

Keep the same prompt: AutoDev <-> Unit Picker <-> Unit Eval

AutoDev prompt

AutoDev prompt template example:

Write unit test for following code.

${context.coc}

${context.framework}

${context.related_model}

```${context.language}
${context.selection}
```

Unit Picker prompt

Unit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:

Instruction(
    instruction = "Complete ${it.language} code, return rest code, no explaining",
    output = it.output,
    input = """
    |```${it.language}
    |${it.relatedCode}
    |```
    |
    |Code:
    |```${it.language}
    |${it.beforeCursor}
    |```""".trimMargin()
)

Unit Eval prompt

Unit Eval prompt should keep the same structure as the AutoDev prompt. Prompt example:

Complete ${language} code, return rest code, no explaining

```${language}
${relatedCode}
```

Code:
```${language}
${beforeCursor}
```

Code quality pipeline

Before Check:

  • FileSize: 64k
  • Complexity: 1000

Code Quality Workflow

Extendable customize quality thresholds

Optional quality type:

enum class CodeQualityType {
    BadSmell,
    TestBadSmell,
    JavaController,
    JavaRepository,
    JavaService,
}

Custom thresholds' config:

data class BsThresholds(
    val bsLongParasLength: Int = 5,
    val bsIfSwitchLength: Int = 8,
    val bsLargeLength: Int = 20,
    val bsMethodLength: Int = 30,
    val bsIfLinesLength: Int = 3,
)

Custom rules:

val apis = apiAnalyser.toContainerServices()
val ruleset = RuleSet(
    RuleType.SQL_SMELL,
    "normal",
    UnknownColumnSizeRule(),
    LimitTableNameLengthRule()
    // more rules
)

val issues = WebApiRuleVisitor(apis).visitor(listOf(ruleset))
// if issues are not empty, then the code has bad smell

Quick Start

for examples, see: examples folder

use CLI

see in config-examples

download the latest version from GitHub Release

Step 1. Generate Instructions

  1. config project by processor.yml
  2. run picker: java -jar unit-cli.jar

Step 2. run Evaluate CLI (Optional)

1.config the unit-eval.yml file and connection.yml

2.run eval: java -jar unit-eval.jar

PS:Connection config: https://framework.unitmesh.cc/prompt-script/connection-config

use Java API

see in config-example

1.add dependency

dependencies {
    implementation("cc.unitmesh:unit-picker:0.1.5")
    implementation("cc.unitmesh:code-quality:0.1.5")
}

2.config the unit-eval.yml file and connection.yml

3.write code

public class App {
    public static void main(String[] args) {
        List<InstructionType> builderTypes = new ArrayList<>();
        builderTypes.add(InstructionType.RELATED_CODE_COMPLETION);

        List<CodeQualityType> codeQualityTypes = new ArrayList<>();
        codeQualityTypes.add(CodeQualityType.BadSmell);
        codeQualityTypes.add(CodeQualityType.JavaService);

        PickerOption pickerOption = new PickerOption(
                "https://github.com/unit-mesh/unit-eval-testing", "master", "java",
                ".", builderTypes, codeQualityTypes, new BuilderConfig()
        );

        SimpleCodePicker simpleCodePicker = new SimpleCodePicker(pickerOption);
        List<Instruction> output = simpleCodePicker.blockingExecute();

        // handle output in here
    }
} 

Thanks to

  • abstract syntax tree: Chapi. Used features: multiple language to same data structure.
  • legacy system analysis: Coca. Inspired: Bad Smell, Test Bad Smell
  • architecture governance tool: ArchGuard. Used features: Estimation, Rule Lint (API, SQL)
  • code database CodeDB. Used features: Code analysis pipeline

LICENSE

This code is distributed under the MPL 2.0 license. See LICENSE in this directory.

About

UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据:代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published
0