Table of Contents
For too long, AI companies have been flagrantly disrespecting website owners by ignoring their robots.txt and scraping everything on their site without permission. With Antlion, you can fight back.
Antlion is Express.js middleware that gives you the ability to set up dedicated routes on your site to become infinitely recursive tar pits designed to trap webscrapers that ignore your robots.txt
file.
-
Bots that ignore your site's
robots.txt
and enter Antlion's pit are locked in an infinitely deep site full of nonsensical garbled text which loads at the speed of a '90s dial-up connection. -
Once bots wait upwards of 20 seconds for a page to finally load, they are presented with several links, each of which leads deeper into Antlion's pit.
-
Antlion also automatically handles serving your
robots.txt
, injecting disallow entries for all trapped routes so ethical bots and search engine indexers skip them automatically — no extra config needed. -
Any malicious webscrapers gathering data to compile datasets for training LLMs will inadvertently digest millions of lines of useless text, ruining the output of models trained with this data, ideally making bot owners think twice before ignoring the rules in your sacred
robots.txt
. -
Adding Antlion to your site is incredibly easy, just install the npm package, give it some unused routes, point it to your existing
robots.txt
, copy and paste a bunch of random text into a file, and add a single hidden link somewhere on your site that leads into the pit. Antlion will take care of the rest.
This is a Node.js module available through the npm registry.
Before installing, download and install Node.js. Node.js 18 or higher is required.
If this is a brand new project, make sure to create a package.json
first with
the npm init
command.
Install it with the
npm install
command:
npm install antlion
-
Create a file
bait.txt
(suggested name), and fill it with as much text as you can. This can be Wikipedia articles, blog posts, textbooks, or even Shakespeare. Do not worry about formatting or special characters. -
Choose a couple routes that you are not/do not plan on using, such as
/blog/
,/docs/installation/
or/aboutus/detailed/
. These can be anything, but the more realistic they are, the better. -
Remove any existing handlers for
/robots.txt
. -
Import Antlion and add it to your server middleware:
import express from 'express'
import antlion from 'antlion'
const app = express()
antlion(app, {
robotsPath: 'robots.txt', // path to existing robots.txt from project root
trainingDataPath: 'bait.txt', // path to training data file from project root
trappedRoutes: ['/example/', '/trap/'] // array of routes to trap
})
// -- rest of your code --
- Hide a link into Antlion's pit somewhere on your site, ideally hidden so regular users will not notice it.
- This trapped link should be one of the trapped routes, optionally followed by some random text. (May make it harder to evade)
- Ex:
/trap/abcdef
, or just/trap
- Dynamic HTML to evade detection
- Bot IP address tracking/logging
- Text generation model caching for faster startup
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Clone the repository:
git clone https://github.com/shsiena/antlion.git
Install dependencies:
cd antlion
npm install
Run test server:
npm run dev
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT licence. See LICENSE
for more information.
Simon Siena - ssiena@uwaterloo.ca
Project Link: https://github.com/shsiena/antlion
Inspired by:
- Nepenthes - "Aaron B." (pseudonym)
- Nightshade - @TheGlazeProject