Note: As of September 2024, this package is no longer maintained. All books available on the Internet Archive, plus many more, can now be downloaded directly from Anna’s Archive. For rudimentary integrattion with Emacs, see my package annas-archive.
This simple Emacs package lets you download any PDF file from the Internet Archive seamlessly, as long as it can be borrowed for at least one hour.
If you are on macOS and use Homebrew, you can install all of the above requirements by running
brew install internetarchive wget adobe-digital-editions calibre
Clone this repository and add this to your init.el
file:
(add-to-list 'load-path "path/to/internet-archive")
Where "path/to/internet-archive"
is the path to the local repository you just cloned.
If you use the elpaca package manager, you just need to add this your init.el
file:
(use-package internet-archive
:ensure (internet-archive
:host github
:repo "benthamite/internet-archive")
:demand t)
If you use straight, just replace :ensure
with :straight
in the formula above.
When running for the first time:
- Configure the
ia
program. Runia configure
and follow the instructions. - Export your IA cookies file. You can export the cookies by installing the Get cookies.txt LOCALLY browser extension. Then go to https://archive.org/, click on the extension, click on ‘export’, and save it to
~/.config/cookies.txt
. (If you would like to save it to different location, you need to manually setinternet-archive-cookies-file
.)
M-x internet-archive
, followed by a URL or a title, depending on whether you already know the URL of the book you would like to download or need to search for it. If you choose to search for a book, you will also be prompted to enter an author (labelled creator
). These fields are each, but not jointly, optional. (The fields are customizable; see below.)
If you have org-protocol
installed and configured on your system, you can also trigger the function directly from your web browser, by creating a bookmark with the following JavaScript code:
javascript:location.href='org-protocol://internet-archive?url=%27 + encodeURIComponent(location.href);
Then simply click on this bookmark after clicking ‘Borrow for 1 hour’.
-
For running search queries, the fields
title
andcreator
are used by default. If you would like to use different fields, you can setinternet-archive-query-fields
. (The full list of admissible fields is here.) Note that the first of the fields in the variable will be used for the initial prompt upon invocation ofinternet-archive
. For example, if you setinternet-archive-query-fields
to("author" "title" "language")
, you will initially be prompted to enter a URL or an author (rather than a URL or a title), and this will be followed by prompts to enter a title and a language. -
When returning results, the fields
title
andcreator
are also used by default. If you would like to use different fields, you can setinternet-archive-metadata-fields
. -
Depending on where in your file system the relevant Calibre and Adobe Digital Editions are found, you may need to set the values of
internet-archive-calibre-directory
andinternet-archive-ade-directory
accordingly. -
Emacs should be able to find the
ia
,wget
andcalibredb
executables. But if it doesn’t, you can specify their location manually by setting the value ofinternet-archive-cli-file
,internet-archive-wget-file
andinternet-archive-calibredb-file
, respectively. -
If you want Adobe Digital Editions to close once it is done downloading the PDF from the Internet Archive, set
internet-archive-ade-close-when-done
tot
. If you want Adobe Digital Editions to open in the background, setinternet-archive-ade-open-in-background
tot
. Note that it seems like ADE will start downloading the file only when it is in the foreground, so this option may be less useful than it appears. -
For the full list of user options,
M-x customize-group RET internet-archive
.
I see more results when I run a search on the Internet Archive website than when I use your package.
The search results we display are deliberately restricted to books available for borrowing. In addition, the search terms will only match results in the associated fields (author
or title
). So e.g. if you enter “borges” when prompted for an author, this will match El aleph, but will not match Borges (because Borges is not the author), Borges por él mismo (because it is not a book) or Ficciones (because it is not currently borrowable).
“This book is not available to borrow at this time. Please try again later.”
This sometimes happens with books that are available for borrowing. It appears to be a limitation of the Internet Archive CLI. If this happens, please go to the website, borrow the book manually, then run internet-archive
again with the URL.
The package has not been extensively tested. If you encounter any problems, feel free to open an issue.