Best Open Source BSD Web Scrapers 2025

Web Scrapers for BSD

Web Scrapers BSD Clear Filters

Browse free open source Web Scrapers and projects for BSD below. Use the toggles on the left to filter open source Web Scrapers by OS, license, language, programming language, and project status.

Trusted software for Data Center Infrastructure Management (DCIM)
Modius provides the solutions for managing the availability, capacity, and efficiency of critical facilities.

Our flagship product, OpenData, provides all the tools including DCIM needed to manage the performance of mission-critical infrastructure, which includes seamless integration of disparate devices.

Learn More
Create state-of-the-art conversational agents with Google AI
Using Dialogflow, you can provide new and engaging ways for users to interact with your product.

Dialogflow can analyze multiple types of input from your customers, including text or audio inputs (like from a phone or voice recording). It can also respond to your customers in a couple of ways, either through text or with synthetic speech. Dialogflow CX and ES provide virtual agent services for chatbots and contact centers. If you have a contact center that employs human agents, you can use Agent Assist to help your human agents. Agent Assist provides real-time suggestions for human agents while they are in conversations with end-user customers.

Try it free
1

Scrapy

A fast, high-level web crawling and web scraping framework

Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring and automated testing.

Downloads: 25 This Week

Last Update: 2024-11-18
See Project
2

WebHarvest - web data extraction tool

Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.

14 Reviews

Downloads: 18 This Week

Last Update: 2017-07-25
See Project
3

Heritrix: Internet Archive Web Crawler

The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.

21 Reviews

Downloads: 10 This Week

Last Update: 2013-06-05
See Project
4

OpenWebSpider

OpenWebSpider is an Open Source multi-threaded Web Spider (robot, crawler) and search engine with a lot of interesting features!

4 Reviews

Downloads: 11 This Week

Last Update: 2017-03-12
See Project
The Voice API that just works | Twilio
Build a scalable voice experience with the API that's connecting millions around the world.

With Twilio Voice, you can build unique phone call experiences with one API, to create, receive, control and monitor calls with just a few lines of code. Create an engaging voice experience that you can quickly scale and modify with a wide array of customization options and resources.

Learn More
5

larbin

Larbin is an HTTP Web crawler with an easy interface that runs under Linux. It can fetch more than 5 million pages a day on a standard PC (with a good network).

Downloads: 13 This Week

Last Update: 2013-04-08
See Project
6

Pavuk Web Spider and Performance Measure

A function-testing, performance-measuring, site-mirroring, web spider that is widely portable and capable of using scenarios to process a wide range of web transactions, including ssl and forms.

2 Reviews

Downloads: 3 This Week

Last Update: 2013-04-24
See Project
7

twitch-batch-downloader

Automate the download of entire Twitch.tv channels

Automate the download of entire Twitch.tv channels with its metadata. Save each Twitch video into its own folder, with date and time values, video ID, stream metadata, frame screenshot, .ts parts list and sha256 hash. Keep the original ts files and generate mp4 files from them. It requires a shell and some command line utilities. See README.md for details in the Code/git section.

Downloads: 4 This Week

Last Update: 1 day ago
See Project
8

Domain Analyzer Security Tool

Finds all the security information for a given domain name

Domain analyzer is a security analysis tool which automatically discovers and reports information about the given domain. Its main purpose is to analyze domains in an unattended way.

Downloads: 1 This Week

Last Update: 2016-11-26
See Project
9

APC Anti Crawler

APC Anti Crawler is a php5 class based on APC which can be used to limit the amount of http request per IP. It stop web crawler to download your entire website.

Downloads: 0 This Week

Last Update: 2013-04-01
See Project
IBM Blueworks Live is a cloud-based business process modeling tool that helps you discover, map and document your processes.
It is easy to use, allowing you to learn and perform business process modeling in minutes.

With an intuitive, web-based interface, IBM Blueworks Live empowers teams to document, analyze and streamline processes with unprecedented ease and efficiency, with no downloads necessary. It's designed for dynamic collaboration, enabling stakeholders to connect, share insights and drive improvements in real-time, from anywhere.

Learn More
10

ApeSmit

ApeSmit is a very simple Python module to create XML sitemaps as defined at http://www.sitemaps.org. ApeSmit doesnt contain any web spider or something like that, it just writes the data you provide to a file using the proper syntax.

Downloads: 0 This Week

Last Update: 2014-06-09
See Project
11

Arachnid Web Spider Framework

Arachnid is a Java-based web spider framework. It includes a simple HTML parser object that parses an input stream containing HTML content. Simple Web spiders can be created by sub-classing Arachnid and adding a few lines of code called after each page

Downloads: 0 This Week

Last Update: 2013-03-08
See Project
12

Aracnis

Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.

Downloads: 0 This Week

Last Update: 2015-07-13
See Project
13

Arn0lD

A new Web Crawler including sophisticated searching process especialized by language !

Downloads: 0 This Week

Last Update: 2013-03-07
See Project
14

C++ web crawler library

arachne is a C++ library for HTTP crawling, link, text and metadata extraction designed to run in a distributed environment.

Downloads: 0 This Week

Last Update: 2014-02-28
See Project
15

Constellio Enterprise Search engine

Open source Search Engine and Enterprise Search

Constellio is an enterprise search engine that allows companies to search all their organization's information through a single interface (Web, CRM, ERP, ECM, Mail etc.). Constellio is Based on Apache Solr and Google Search Appliance's connector. Constellio has a powerful web crawler.

Downloads: 0 This Week

Last Update: 2015-03-31
See Project
16

Crawler.NET

Crawler.NET is a component-based distributed framework for web traversal intended for the .NET platform. It comprises of loosely coupled units each realizing a specific web crawler task. The main design goals are efficiency and flexibility.

1 Review

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
17

DeDuplicator (Heritrix add-on)

The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Downloads: 0 This Week

Last Update: 2013-04-02
See Project
18

Distributed Webhunter

Webhunter is a distributed, multi-threaded web crawler designed for both general indexing and crawling the web for focused content.

Downloads: 0 This Week

Last Update: 2013-04-05
See Project
19

Easyspider - Distributed Web Crawler

Easy Spider is a distributed Perl Web Crawler Project from 2006

Easy Spider is a distributed Perl Web Crawler Project from 2006. It features code from crawling webpages, distributing it to a server and generating xml files from it. The client site can be any computer (Windows or Linux) and the Server stores all data. Websites that use EasySpider Crawling for Article Writing Software: https://www.artikelschreiber.com/en/ https://www.unaique.net/en/ https://www.unaique.com/ https://www.artikelschreiber.com/marketing/ https://www.paraphrasingtool1.com/ https://www.artikelschreiben.com/ https://buzzerstar.com/ https://easyperlspider.sourceforge.io/ http://artikelschreiber.net/ http://sebastianenger.com/ http://unaique.de/ http://unaique.org/ It is fun to look at some code that is few years ago and to see how one has improved himself. If you want to write text automatically try https://www.artikelschreiber.com/en/ or https://www.unaique.net/en/!

1 Review

Downloads: 0 This Week

Last Update: 2023-06-24
See Project
20

FWebSpider

FWebSpider is a web crawler application written on Perl. It performs chosen site crawl, featuring response cache, URL storage, URL exclusion rules and more. It is developed to function as a local/global site search engine core.

Downloads: 0 This Week

Last Update: 2013-03-26
See Project
21

Fetchgals

A multi-threaded web spider that finds free porn thumbnail galleries by visiting a list of known TGPs (Thumbnail Gallery Posts). It optionally downloads the located pictures and movies. TGP list is included. Public domain perl script running on Linux.

2 Reviews

Downloads: 0 This Week

Last Update: 2013-03-12
See Project
22

Funnel - Web Spider

Funnel is a project for use on intranets, or selected sites on the Internet to gather together and index information from several different sources and make it available through a sane, usable interface.

Downloads: 0 This Week

Last Update: 2013-04-19
See Project
23

Harvest Web Indexing

Harvest is a web indexing package, originally disigned for distributed indexing, it can form a powerful system for indexing both large and small web sites. Also now includes Harvest-NG a highly efficient, modular, perl-based web crawler.

Downloads: 0 This Week

Last Update: 2013-04-09
See Project
24

HtmlClient

HtmlClient provides an SGML/HTML/XHTML parser and connection client making web-spidering as easy for developers as actually surfing the web with a premade browser. Based on Apache's HttpClient.

Downloads: 0 This Week

Last Update: 2013-03-08
See Project
25

ItSucks

This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.

3 Reviews

Downloads: 0 This Week

Last Update: 2013-04-29
See Project