8000 GitHub - catoncat/defuddle-cli: Command line utility to extract clean html, markdown and metadata from web pages.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

catoncat/defuddle-cli

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Defuddle CLI

Command line interface for Defuddle. Extract clean HTML or Markdown from pages.

Installation

npm install -g defuddle-cli

Usage

defuddle parse <source> [options]

Arguments

  • source: HTML file path or URL to parse

Options

  • -o, --output <file>: Output file path (default: stdout)
  • -m, --markdown, --md: Convert content to markdown
  • -j, --json: Output as JSON with both HTML and markdown content
  • -p, --property <name>: Extract a specific property (e.g., title, description, domain)
  • --debug: Enable debug mode
  • -h, --help: Display help for command

Examples

Parse a local HTML file (outputs HTML):

defuddle parse article.html

Parse a URL and convert to markdown:

defuddle parse https://example.com/article --md

Parse and get the full JSON response from Defuddle:

defuddle parse article.html --json

Save markdown output to a file:

defuddle parse article.html --md -o output.md

Extract specific properties:

# Get just the title
defuddle parse article.html --property title

# Get the description
defuddle parse article.html -p description

# Get the domain
defuddle parse article.html --property domain

Development

# Install dependencies
npm install

# Build
npm run build

# Run in development mode
npm run dev

About

Command line utility to extract clean html, markdown and metadata from web pages.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 54.0%
  • JavaScript 46.0%
0