10000 GitHub - lixinsu/DSSM: representation-based duplicated question identification
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

lixinsu/DSSM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

basic text matching model

representation-based method

基本的文本匹配模型,用于重复问题检测。模型使用词语和字符两种级别的嵌入向量,将两段文本进行LSTM表示,然后拼接得到隐含向量,进行二分类。

features

  • pretrained word and char embedding
  • combine word-level and char-level matching signal

training

  • clone the reposity and run sh scripts/setup.sh
  • cd data/atec and download data from - atec data
  • split the origin csv file to train.csv , dev.csv and test.csv
  • run sh scripts/train.sh

About

representation-based duplicated question identification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0