8000 pjw-cmd / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View pjw-cmd's full-sized avatar

Block or report pjw-cmd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🚀🚀🚀feapder is an easy to use, powerful crawler framework | feapder是一款上手简单,功能强大的Python爬虫框架。内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。且支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。更有功能强大的爬虫管理系统feaplat为其提…

Python 3,265 509 Updated Mar 17, 2025

AgentQL is a suite of tools for connecting your AI to the web. Featuring a query language and Playwright integrations for interacting with elements and extracting data quickly, precisely, and at sc…

Python 826 106 Updated May 22, 2025
HTML 28 10 Updated Oct 17, 2024

Official implement of paper "AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation" [EMNLP 24']

Python 467 42 Updated Jan 3, 2025

Python scraper based on AI

Python 19,840 1,682 Updated May 30, 2025

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

TypeScript 54,051 5,213 Updated May 30, 2025

Formula recognition based on LaTeX-OCR and ONNXRuntime.

Python 349 33 Updated Nov 3, 2024

[ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"

Python 223 15 Updated Apr 14, 2025

Convert PDF to markdown + JSON quickly with high accuracy

Python 25,505 1,636 Updated May 30, 2025

🌳CED: Catalog Extraction from Documents

Python 16 1 Updated Jul 30, 2023

Prompt-learning methods used BERT4Keras (PET, EFL and NSP-BERT), both for Chinese and English.

Python 29 Updated Oct 12, 2022

A universal scalable machine learning model deployment solution

Java 217 74 Updated May 31, 2025

Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…

Python 2,615 288 Updated Jun 24, 2024

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 49,868 8,271 Updated Jun 1, 2025

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Python 459 109 Updated Jul 4, 2022

An implementation of the Splitting and Merging table recognition method.

Python 79 30 Updated Feb 17, 2020

Mirror of Apache PDFBox

Java 2,836 901 Updated Jun 1, 2025

CDLA: A Chinese document layout analysis (CDLA) dataset

Python 267 32 Updated Sep 13, 2021

OCR toolbox from Davar-Lab

Python 750 156 Updated Nov 16, 2023

An implementation of the BERT model and its related downstream tasks based on the PyTorch framework. @月来客栈

Python 597 110 Updated Mar 14, 2025

使用Bert,ERNIE,进行中文文本分类

Python 4,266 912 Updated Jun 28, 2024

PyTorch implementation of BERT in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

Python 106 29 Updated Nov 1, 2018

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Python 13,456 2,941 Updated May 14, 2025

A JVM wrapper for the popular SLSQP optimizer

Java 28 7 Updated Apr 26, 2022

Pdf2Dom is a PDF parser that converts the documents to a HTML DOM representation. The obtained DOM tree may be then serialized to a HTML file or further processed. A command-line utility for conve…

Java 184 75 Updated Oct 17, 2022

Demo and other Python3 code

Python 684 354 Updated Jan 6, 2024

Non-local Neural Networks for Video Classification

Python 1 Updated Jan 16, 2019
1 Updated Aug 20, 2019
1 Updated Jul 29, 2020
Next
0