8000 GitHub - LexMachinaInc/html2text: Convert HTML to Markdown-formatted text.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

LexMachinaInc/html2text

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Why does this fork exist?

  • better build process
  • (less) disgusting code
  • maintaineable

If you use this software

Please take a moment to pay your respects to Aaron.

Usage

From within Python:

import html2text
print html2text.html2text("<p>Hello, world.</p>")

Or with some configuration options:

import html2text
h = html2text.HTML2Text()
h.ignore_links = True
print h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")

Originally written by Aaron Swartz. This code is distributed under the GPLv3.

Getting started (developers)

This project uses the pybuilder.

sudo pip install pyb_init
pyb-init github mriehl : html2text

Further building (includes coverage, pep8 linting, building a release) can be done with

source venv/bin/activate
pyb

About

Convert HTML to Markdown-formatted text.

Resources

License

Stars

Watchers

Forks

Packages

No packages published
0