8000 GitHub - dmich2/ScrapeVSXCom: ScrapeVSXCom is a command line program written in F# that scrapes VisualStudioExtensibility.com and generates a tab-separated-values file with all the article meta information.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

ScrapeVSXCom is a command line program written in F# that scrapes VisualStudioExtensibility.com and generates a tab-separated-values file with all the article meta information.

License

Notifications You must be signed in to change notification settings

dmich2/ScrapeVSXCom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VisualStudioExtensibility.com Article Meta-Information Scraper


VisualStudioExtensibility.com is a great website that contains a lot information on writing extensions for Microsoft's Visual Studio. Unfortunately it isn't very easy to get a quick overview of all the articles (490+ at the time of this writing).

ScrapeVSXCom is a command line program written in F# that scrapes VisualStudioExtensibility.com and generates a tab-separated-values, file where each article line contains:

  • Category
  • Post date
  • Title
  • Type (HowTo, Bug, or Other)
  • URL
  • CategoryURL

This file can then be imported into Microsoft Excel, Google Sheet, etc. and the articles filtered/sorted via any column.

ScrapeVSXCom.csv imported into Microsoft Excel

This repo contains ScrapeVSXCom.csv and ScrapeVSXCom.xlsx files if you want to see an example of the generated scraped data.

Usage

To use, open ScrapeVSXCom.sln in Visual Studio 2013 and Build. Then open a Command Prompt window and navigate to the ScrapeVSXCom project sub-folder. Enter the following command:

    bin\Release\ScrapeVSXCom.exe

or

    bin\Debug\ScrapeVSXCom.exe

Various informational messages will be printed in the Command Prompt window to show the progress of the program.

Two files are also generated:

  • ScrapeVSXCom.log contains everything printed to the console, summary information on the articles (Category, Year, and Type), and all the articles listed by Category, Type, Month, and Title.

  • ScrapeVSXCom.csv is the tab-separated-values file discussed above.

These two files are versioned so if you run the program multiple times an existing ScrapeVSXCom.log will be renamed to ScrapeVSXCom.1.log, ScrapeVSXCom.1.log to ScrapeVSXCom.2.log, etc up to ScrapeVSXCom.10.log.

ScrapeVSXCom stores the downloaded pages in <SolutionDir>\ScrapeVSXCom\cache. If you re-run the program it will read from the local cache instead of downloading from VisualStudioExtensibility.com again. To avoid this behavior you can simply delete the entire cache directory or just <SolutionDir>\ScrapeVSXCom\cache\www.visualstudioextensibility.com\index.html and the months you want to re-download (Usually the current month is enough).

About

ScrapeVSXCom is a command line program written in F# that scrapes VisualStudioExtensibility.com and generates a tab-separated-values file with all the article meta information.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0