arxiv-sanity-lite

一个更轻量级的 arxiv-sanity 从头开始重写。定期轮询 arxiv API 以获取新论文。然后允许用户标记感兴趣的论文，并根据论文摘要的 tfidf 特征基于 SVM 为每个标签推荐新论文。允许人们在漂亮的 Web UI 中搜索、排名、排序、切片和切块这些结果。最后，arxiv-sanity-lite 可以每天向您发送电子邮件，其中包含根据您的标签推荐的新论文。整理您的标签，跟踪您所在领域的最新论文，不要错过！

我正在arxiv-sanity-lite.com上运行此代码的实时版本。

跑步

要在本地运行此程序，我通常运行以下脚本以使用任何新论文更新数据库。我通常通过定期 cron 作业来安排此操作：

#!/bin/bash

python3 arxiv_daemon.py --num 2000

if [ $? -eq 0 ]; then echo "New papers detected! Running compute.py" python3 compute.py else echo "No new papers were added, skipping feature computation" fi

您可以看到，更新数据库首先使用 arxiv api 下载新论文arxiv_daemon.py，然后运行compute.py计算论文的 tfidf 特征。最后，为了在本地为 Flask 服务器提供服务，我们将运行如下命令：

export FLASK_APP=serve.py; flask run

所有数据库都将存储在该data目录中。最后，如果您想在互联网上运行自己的实例，我建议只需在 Linode 上运行上述代码，例如，我当前在最小的“Nanode 1 GB”实例上运行此代码，索引约 30K 篇论文，成本为 5 美元/月。

（可选）最后，如果您想定期向用户发送有关新论文的电子邮件，请参阅send_emails.py脚本。你也必须这样做pip install sendgrid。我在日常 cron 作业中运行这个脚本。

要求

按要求安装：

pip install -r requirements.txt

全部

通过 CSS 等媒体查询使网站适合移动设备
为了提高效率，metas 表不应该是 sqlitedict，而应该是适当的 sqlite 表
构建反向索引以支持更快的搜索，现在我们迭代整个数据库

执照

和

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
aslite		aslite
data		data
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
arxiv_daemon.py		arxiv_daemon.py
compute.py		compute.py
requirements.txt		requirements.txt
screenshot.jpg		screenshot.jpg
send_emails.py		send_emails.py
serve.py		serve.py
thumb_daemon.py		thumb_daemon.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arxiv-sanity-lite

跑步

要求

全部

执照

About

Releases

Packages

Languages

License

yuanzhongqiao/arxiv-sanity-lite

Folders and files

Latest commit

History

Repository files navigation

arxiv-sanity-lite

跑步

要求

全部

执照

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages