arxiv.py

Python wrapper for the arXiv API.

About arXiv

arXiv is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.

Usage

Installation

$ pip install arxiv

Verify the installation with

$ python setup.py test

In your Python script, include the line

import arxiv

Query

arxiv.query(query="",
            id_list=[],
            max_results=None,
            start = 0,
            sort_by="relevance",
            sort_order="descending",
            prune=True,
            iterative=False,
            max_chunk_results=1000)

Argument	Type	Default
`query`	string	`""`
`id_list`	list of strings	`[]`
`max_results`	int	10
`start`	int	0
`sort_by`	string	`"relevance"`
`sort_order`	string	`"descending"`
`prune`	boolean	`True`
`iterative`	boolean	`False`
`max_chunk_results`	int	1000

query: an arXiv query string. Format documented here.
- Note: multi-field queries must be space-delimited. au:balents_leon AND cat:cond-mat.str-el is valid; au:balents_leon+AND+cat:cond-mat.str-el is not valid.
id_list: list of arXiv record IDs (typically of the format "0710.5765v1").
max_results: the maximum number of results returned by the query.
start: the offset of the first returned object from the arXiv query results.
sort_by: the arXiv field by which the result should be sorted.
sort_order: the sorting order, i.e. "ascending", "descending" or None.
prune: when True, received abstract objects will be simplified.
iterative: when True, query() will return an iterator. Otherwise, query() iterates internally and returns the full list of results.
max_chunk_results: the maximum number of abstracts ot be retrieved by a single internal request to the arXiv API.

Query examples:

import arxiv

# Keyword queries
arxiv.query(query="quantum", max_results=100)
# Multi-field queries
arxiv.query(query="au:balents_leon AND cat:cond-mat.str-el")
# Get single record by ID
arxiv.query(id_list=["1707.08567"])
# Get multiple records by ID
arxiv.query(id_list=["1707.08567", "1707.08567"])

# Get interator over query results
result = arxiv.query(query="quantum", max_chunk_results=10, iterative=True)
for paper in result():
   print(paper)

For a more detailed description of the interaction between query and id_list, see this section of the arXiv documentation.

Download article PDF or source tarfile

arxiv.arxiv.download(obj, dirpath='./', slugify=slugify, prefer_source_tarfile=False<
684C
/span>)

Argument	Type	Default	Required?
`obj`	dict	N/A	Yes
`dirpath`	string	`"./"`	No
`slugify`	function	`arxiv.slugify`	No
`prefer_source_tarfile`	bool	`False`	No

obj is a result object, one of a list returned by query(). obj must at minimum contain values corresponding to pdf_url and title.
dirpath is the relative directory path to which the downloaded PDF will be saved. It defaults to the present working directory.
slugify is a function that processes obj into a filename. By default, arxiv.download(obj) prepends the object ID to the object title.
If prefer_source_tarfile is True, this function will download the source files for obj––rather than the rendered PDF––in .tar.gz format.

import arxiv
# Query for a paper of interest, then download
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)
# You can skip the query step if you have the paper info!
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
          "title": "The Paper Title"}
arxiv.download(paper2)

# Download the gzipped tar file
arxiv.download(paper,prefer_source_tarfile=True)

# Returns the object id
def custom_slugify(obj):
    return obj.get('id').split('/')[-1]

# Download with a specified slugifier function
arxiv.download(paper, slugify=custom_slugify)

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.github		.github
arxiv		arxiv
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arxiv.py

About arXiv

Usage

Installation

Query

Download article PDF or source tarfile

Contributors

About

Releases

Packages

Languages

License

Zengai/arxiv.py

Folders and files

Latest commit

History

Repository files navigation

arxiv.py

About arXiv

Usage

Installation

Query

Download article PDF or source tarfile

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages