8000 added various bots and services to excluded agents by mackuba · Pull Request #239 · matomo-org/matomo-log-analytics · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

added various bots and services to excluded agents #239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mackuba
Copy link
Contributor
@mackuba mackuba commented Jan 27, 2019

I'm currently in the process of setting up Matomo on my server (in log analytics mode). I saw in the visitor log that some entries that didn't look human were sneaking through, so I started analysing the logs I have from the last 12 months in order to filter out some more bots that the script currently misses. I think I managed to find some good keywords that catch quite a lot of bot requests and don't seem to have any false positives.

Here's the list of keywords I've added, and for each keyword the specific unique user agents it catches (from a sample of ~2.5M lines).

I've also removed adsbot-google from the existing list, since it's already covered by the bot- pattern.


Bots that Matomo currently doesn't catch

These are currently tracked as normal visits, I've tested this with --regex-group-to-visit-cvar="user_agent=UserAgent" and then checked in the custom variables table that they do appear in the results:

favicon

BoardReader Favicon Fetcher /1.0 info@boardreader.com
Hatena-Favicon2 (http://www.hatena.ne.jp/faq/)
Mozilla/5.0 (X11; Linux x86_64; rv:20.0; Favicon; +https://github.com/ArthurHoaro/favicon) Gecko/20100101 Firefox/32.0

Already matched by existing filters:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36 Google Favicon
Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)
NewsBlur Favicon Fetcher - 4 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Favicon Fetcher - 3 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
com.ddeville.llwebkit.favicon/158 CFNetwork/902.1 Darwin/17.7.0 (x86_64)
NewsBlur Favicon Fetcher - 5 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Favicon Fetcher - 1 subscriber - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
com.ddeville.llwebkit.favicon/158 CFNetwork/974.1 Darwin/18.0.0 (x86_64)

thumb

link_thumbnailer
Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.170 Safari/537.36 Thumbshots/10.0.0.0

Already matched by existing filters:

Thumbor/6.4.2
rethumb/v1 (http://rethumb.com)

fetch

BoardReader Favicon Fetcher /1.0 info@boardreader.com
FetchStream
Hatena::Fetcher/0.01 (master) Furl/3.13
Hatena::Fetcher/0.01 (master) Furl/3.06
node-fetch/1.0 (+https://github.com/bitinn/node-fetch)

Already matched by existing filters:

NewsBlur Page Fetcher - 3 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Page Fetcher - 4 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Page Fetcher - 5 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Favicon Fetcher - 4 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Favicon Fetcher - 3 subscribers - http://
8000
www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Favicon Fetcher - 5 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Content Fetcher - 66 subscribers - http://www.newsblur.com/site/6124077/hacker-news (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
NewsBlur Content Fetcher - 3 subscribers - http://www.newsblur.com/site/7060334/mackubaeu (Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_1) AppleWebKit/534.48.3 (KHTML, like Gecko) Version/5.1 Safari/534.48.3)
...etc.

safarifetcherd/604.1 CFNetwork/974.2.1 Darwin/18.0.0
safarifetcherd/604.1 CFNetwork/975.0.3 Darwin/18.2.0
safarifetcherd/604.1 CFNetwork/901.1 Darwin/17.6.0
safarifetcherd/604.1 CFNetwork/976 Darwin/18.2.0
safarifetcherd/604.1 CFNetwork/958.1 Darwin/18.0.0
...etc.

FeedHQ/2018.04.26.1524757137 (https://github.com/feedhq/feedhq; ping; https://github.com/feedhq/feedhq/wiki/fetcher; like FeedFetcher-Google)
FeedHQ/2018.11.03.1541230374 (https://github.com/feedhq/feedhq; ping; https://github.com/feedhq/feedhq/wiki/fetcher; like FeedFetcher-Google)
Feedly/1.0 (+http://www.feedly.com/fetcher.html; like FeedFetcher-Google)
Fetchbot (https://github.com/PuerkitoBio/fetchbot)
HatenaBookmark/0.03 (compatible; entryimage-fetcher)
Mediumbot-MetaTagFetcher/0.3 (+https://medium.com/)
Mozilla/5.0 (compatible; BazQux/2.4; +https://bazqux.com/fetcher; 1 subscribers)
Mozilla/5.0 (compatible; Cloudflare-AMP/1.0; +https://amp.cloudflare.com/doc/fetcher.html) AppleWebKit/534.34
Mozilla/5.0 (compatible; ImageFetcher/7.0; +http://images.weserv.nl/)
Mozilla/5.0 (compatible; ImageFetcher/8.0; +http://images.weserv.nl/)
R6_FeedFetcher(www.radian6.com/crawler)

backlink

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0 ; BacklinkHttpStatus)

Already matched by existing filters:

Mozilla/5.0 (compatible; SiteExplorer/1.1b; +http://siteexplorer.info/Backlink-Checker-Spider/)
MBCrawler/1.0 (https://monitorbacklinks.com)
BacklinkCrawler (http://www.backlinktest.com/crawler.html)

hatena

Hatena Star UserAgent/2
Hatena-Favicon2 (http://www.hatena.ne.jp/faq/)
Hatena::Fetcher/0.01 (master) Furl/3.13
Hatena::Fetcher/0.01 (master) Furl/3.06
HatenaBookmark/0.03 (Hatena::Bookmark; master;) Furl/3.13
HatenaBookmark/4.0 (Hatena::Bookmark; Analyzer)

Already matched by existing filters:

Hatena::Russia::Crawler/0.01
Hatena::Scissors/0.01
HatenaBookmark/4.0 (Hatena::Bookmark; Scissors)
HatenaBookmark/0.03 (compatible; entryimage-fetcher)

python

python-requests/1.2.3 CPython/2.7.12 Linux/4.9.76-3.78.amzn1.x86_64
python-requests/2.10.0
python-requests/2.11.0
python-requests/2.11.1
... etc.

python-requests/2.2.1 CPython/2.7.12 Linux/4.18.16-x86_64-linode118
python-requests/2.2.1 CPython/2.7.12 Linux/4.18.8-x86_64-linode117
python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-042stab127.2
... etc.

python-requests/2.7.0 CPython/2.7.0 Windows/2008ServerR2
python-requests/2.7.0 CPython/2.7.14 Windows/2008ServerR2
python-requests/2.7.0 CPython/2.7.15 Windows/2012ServerR2

Python-urllib/1.17
Python-urllib/2.7
Python-urllib/3.4
Python-urllib/3.5
Python-urllib/3.6

Python/3.5 aiohttp/2.0.2
Python/3.5 aiohttp/2.2.0
Python/3.5 aiohttp/2.3.3
Python/3.5 aiohttp/3.1.3
Python/3.5 aiohttp/3.3.2
... etc.

request

Elytra/0.10.0 (Macintosh; Ubuntu/14.06) GCDHTTPRequest
Mozilla/4.0 (compatible; Win32; WinHttp.WinHttpRequest.5)
Yeti/0.10.0 (Macintosh; Ubuntu/14.06) GCDHTTPRequest
request.js
request

python-requests/2.18.4
python-requests/2.19.1
python-requests/2.10.0
python-requests/2.20.0
python-requests/2.18.1
...etc.

phantomjs

Mozilla/5.0 (Macintosh; Intel Mac OS X) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) PhantomJS/1.9.8 Safari/534.34
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.0.0 Safari/538.1
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Version/7.0 Safari/538.1
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) PhantomJS/2.1.1 Version/9.0 Safari/602.1

Already matched by existing filters:

Mozilla/5.0 PhantomJS (compatible; Seznam screenshot-generator 2.1; +http://fulltext.sblog.cz/screenshot/)

okhttp

okhttp/3.2.0
okhttp/3.9.1
okhttp/3.10.0
okhttp/3.9.0
okhttp/3.8.1
...etc.

headless

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/70.0.3508.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3282.119 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/69.0.3452.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/63.0.3239.132 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/65.0.3312.0 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/63.0.3205.0 Safari/537.36
...etc.

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/64.0.3282.167 HeadlessChrome/64.0.3282.167 Safari/537.36
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/66.0.3359.181 HeadlessChrome/66.0.3359.181 Safari/537.36

Already matched by existing filters:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/69.0.3494.0 Safari/537.36 WordPress.com mShots
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/67.0.3372.0 Safari/537.36 WordPress.com mShots
Mozilla/5.0 (compatible; DnyzBot/1.0) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3282.167 
Mozilla/5.0 (compatible; DnyzBot/1.0) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/64.0.3264.0 Safari/537.36

http-client

Go-http-client/1.1

Already matched by existing filters:

Mozilla/5.0 (compatible; Go-http-client/1.1; +centurybot9@gmail.com)

http_client

Zend_Http_Client

httpclient

Apache-HttpClient/4.5.2 (Java/1.8.0_151)
Apache-HttpClient/4.5.2 (Java/1.8.0_181)
Apache-HttpClient/4.4.1 (Java/1.8.0_65)
Apache-HttpClient/4.5.2 (Java/1.8.0_65)
...etc.

EventMachine HttpClient
HTTPClient/1.0 (2.6.0.1, ruby 2.0.0 (2015-12-16))
HTTPClient/1.0 (2.8.3, ruby 2.2.1 (2015-02-26))
Jakarta Commons-HttpClient/3.1
Mozilla/5.0 (compatible; Funnelback) RPT-HTTPClient/0.3-3E

Already matched by existing filters:

LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)
LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/4.3 +http://www.linkedin.com)
LinkedInBot/1.0 (compatible; Mozilla/5.0; Apache-HttpClient +http://www.linkedin.com)
Mozilla/5.0 (X11; compatible; semantic-visions.com crawler; HTTPClient 3.1)
Java 1.7 Apache HttpClient (Linux x86_64) / GnowitNewsbot / Contact information at http://www.gnowit.com

appengine

AppEngine-Google; (+http://code.google.com/appengine; appid: s~readability-api-hrd)
AppEngine-Google; (+http://code.google.com/appengine; appid: e~finscience-1253)
AppEngine-Google; (+http://code.google.com/appengine; appid: s~cdn-dinoia)

GAE AppEngine-Google; (+http://code.google.com/appengine; appid: s~ga-mozilla-org-prod-001)

Mozilla/5.0 AppEngine-Google; (+http://code.google.com/appengine; appid: s~mendoapp1)
Mozilla/5.0 AppEngine-Google; (+http://code.google.com/appengine; appid: s~vodio-app)
Mozilla/5.0 AppEngine-Google; (+http://code.google.com/appengine; appid: e~finscience-1253)

Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) AppEngine-Google; (+http://code.google.com/appengine; appid: s~xiaohe18675)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) AppEngine-Google; (+http://code.google.com/appengine; appid: s~thgntk2)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) AppEngine-Google; (+http://code.google.com/appengine; appid: s~kindle-11)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) AppEngine-Google; (+http://code.google.com/appengine; appid: s~feedkin)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) AppEngine-Google; (+http://code.google.com/appengine; appid: s~ebook-rexdf)

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~feedly-nikon3)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/23.0.912.77 Safari/535.7 AppEngine-Google; (+http://code.google.com/appengine; appid: s~feedly-nikon3)
Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US) AppEngine-Google; (+http://code.google.com/appengine; appid: s~virustotalcloud)
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~simple-rss-proxy)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~feedly-nikon3)
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729) AppEngine-Google; (+http://code.google.com/appengine; appid: s~theajaxpost)
Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36 AppEngine-Google; (+http://code.google.com/appengine; appid: s~twilinks123)
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0) AppEngine-Google; (+http://code.google.com/appengine; appid: b~thatkindleear)

evergreen

Evergreen (macOS; RSS Reader; https://ranchero.com/evergreen/)

netnewswire

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/602.3.12 (KHTML, like Gecko) NetNewsWire/3.3.2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/534.57.7 (KHTML, like Gecko) NetNewsWire/3.3.2
NetNewsWire (macOS; RSS Reader; https://ranchero.com/netnewswire/)
NetNewsWire/3.3.2 (Mac OS X; http://netnewswireapp.com/mac/; gzip-happy)
NetNewsWire/4.1.0 (Mac OS X; http://netnewswireapp.com/mac/; gzip-happy)

rss

Evergreen (macOS; RSS Reader; https://ranchero.com/evergreen/)
Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36 AppEngine-Google; ( http://code.google.com/appengine; appid: s~simple-rss-proxy)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) QuiteRss/0.18.12 Safari/538.1
NetNewsWire (macOS; RSS Reader; https://ranchero.com/netnewswire/)
rss/10 CFNetwork/976 Darwin/18.2.0
rss/5 CFNetwork/897.15 Darwin/17.5.0
rss/5 CFNetwork/902.2 Darwin/17.7.0
rss/5 CFNetwork/974.2.1 Darwin/17.7.0
rss/8 CFNetwork/974.2.1 Darwin/18.0.0
rss/8 CFNetwork/975.0.3 Darwin/17.7.0
ruby:net.bhaak.rss2html:1.0
Visor News Reader RSS/1.0 (admin@visorco.com)
Winds: Open Source RSS & Podcast app: https://getstream.io/winds/

Already matched by existing filters:

Tiny Tiny RSS/17.12 (0cd4a88) (http://tt-rss.org/)
Tiny Tiny RSS/17.4 (http://tt-rss.org/)
Tiny Tiny RSS/17.12 (8702ded) (http://tt-rss.org/)
RSS Bot/2.6 (Mac OS X Version 10.13.5 (Build 17F77))
RSS Bot/2.7 (Mac OS X Version 10.13.4 (Build 17E199))
RSS Bot/2.6 (Mac OS X Version 10.14 (Build 18A314h))
RSSOwl/2.2.1.201312301314 (Windows; U; en)

ruby

Ruby
Ruby, Twurly v1.1 (https://twurly.org)
ruby:net.bhaak.rss2html:1.0
rest-client/2.0.2 (linux-gnu x86_64) ruby/2.3.4p301
rest-client/2.0.0.rc2 (linux-gnu x86_64) ruby/2.1.3p242
rest-client/2.0.0 (linux-gnu x86_64) ruby/2.5.1p57
HTTPClient/1.0 (2.6.0.1, ruby 2.0.0 (2015-12-16))
HTTPClient/1.0 (2.8.3, ruby 2.2.1 (2015-02-26))

Already matched by existing filters:

Mechanize/2.7.5 Ruby/2.5.3p105 (http://github.com/sparklemotion/mechanize/)

node (or could be limited to node-, node.)

Mozilla/5.0 (Mixnode) AppleWebKit/537.36 (KHTML, like Gecko)

node-fetch/1.0 (+https://github.com/bitinn/node-fetch)
node-superagent/3.8.2
node-superagent/2.3.0
node-superagent/0.18.2
node.io
node.js
Node.js (linux; U; rv:v6.9.1) AppleWebKit/537.36 (KHTML, like Gecko)

python-requests/2.2.1 CPython/2.7.12 Linux/4.18.16-x86_64-linode118
python-requests/2.2.1 CPython/2.7.12 Linux/4.18.8-x86_64-linode117

http.rb

http.rb/4.0.0
http.rb/3.3.0
http.rb/3.0.0

http.rb/3.2.0 (Mastodon/2.4.3rc2; +https://mastodon.social/)
http.rb/3.2.0 (Mastodon/2.4.1; +https://mastodon.xyz/)
http.rb/3.2.0 (Mastodon/2.4.0; +https://mastodon.starrevolution.org/)
http.rb/2.2.2 (Mastodon/2.0.0; +https://retro.social/)
http.rb/2.2.2 (Mastodon/1.4.1; +http://mastodon.club/)
... etc. - about a hundred of other mastodons

scanner

Mozilla/5.0 (compatible; adscanner/)
Mozilla/5.0 (compatible; DNS SSL/TLS HTTP HTML Website Security Scanner/0.2 beta; +https://www.htmlyse.com/)

Already matched by existing filters:

Mozilla/5.0 (compatible; seoscanners.net/1; +spider@seoscanners.net)

datanyze

Mozilla/5.0 (X11; Datanyze; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36

check

httpscheck (unknown version) CFNetwork/897.15 Darwin/17.5.0 (x86_64)
httpscheck (unknown version) CFNetwork/893.13.1 Darwin/17.4.0 (x86_64)
httpscheck (unknown version) CFNetwork/976 Darwin/18.2.0 (x86_64)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 (compatible; linkCheckV3.0)

Already matched by existing filters:

com.ddeville.llwebkit.linkchecker/1 CFNetwork/893.13.1 Darwin/17.4.0 (x86_64)
com.ddeville.llwebkit.linkchecker/1 CFNetwork/975.0.3 Darwin/18.2.0 (x86_64)
com.ddeville.llwebkit.linkchecker/1 CFNetwork/897.15 Darwin/17.5.0 (x86_64)
com.ddeville.llwebkit.linkchecker/1 CFNetwork/902.1 Darwin/17.7.0 (x86_64)
com.ddeville.llwebkit.linkchecker/1 CFNetwork/974.1 Darwin/18.0.0 (x86_64)
...etc.

COMODO SSL Checker
DomainCheck.io Crawler/1.3 (https://domaincheck.io)
FeedChecker-Zocle/1.0 (+https://zocle.com/zoclechecker)
IABot: Checking if link from Wikipedia is broken and needs removal - See https://meta.wikimedia.org/wiki/InternetArchiveBot/FAQ_for_sysadmins
inboundli link checker
linkdex.com/2.0 Client Check v2.0
Mozilla/5.0 (compatible; LinkChecker/9.4; +http://wummel.github.io/linkchecker/)
Mozilla/5.0 (compatible; LinkChecker/9.4.0; +http://wummel.github.io/linkchecker/)
Mozilla/5.0 (compatible; SiteExplorer/1.1b; +http://siteexplorer.info/Backlink-Checker-Spider/)

wkhtmlto

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) wkhtmltoimage Safari/538.1
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) wkhtmltopdf Safari/534.34
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.34 (KHTML, like Gecko) wkhtmltoimage Safari/534.34
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) wkhtmltoimage Version/7.0 Safari/538.1
Mozilla/5.0 (X11; BSD Four) AppleWebKit/534.34 (KHTML, like Gecko) wkhtmltoimage Safari/534.34

sweeper

Houdini%20Sweeper/3000 CFNetwork/893.13.1 Darwin/17.4.0 (x86_64)
Houdini%20Sweeper/3000 CFNetwork/897.15 Darwin/17.5.0 (x86_64)

scrap

GoScraper
MetadataScraper

Already matched by existing filters:

Scrapy/1.5.0 (+https://scrapy.org)
Scrapy/1.5.1 (+https://scrapy.org)
Scrapy/1.4.0 (+http://scrapy.org)
Scrapy/1.0.5 (+http://scrapy.org)
facebookexternalhit/1.1;kakaotalk-scrap/1.0; +https://devtalk.kakao.com/t/scrap/33984
Mozilla/5.0 (compatible; ScraperHut/1.0; +https://www.scraperhut.com/robot/)

embedly

Mozilla/5.0 (compatible; Embedly/0.2; +http://support.embed.ly/)

Already matched by existing filters:

Embedly +support@embed.ly

embed php

Embed PHP Library
Embed PHP library

tracemyfile

Mozilla/5.0 (compatible; tracemyfile/1.0)

Already matched by existing filters:

Mozilla/5.0 (compatible; tracemyfile/1.0; +bot@tracemyfile.com)

wget

Wget/1.17.1 (linux-gnu)
Wget/1.19.4 (linux-gnu)
Wget/1.14 (linux-gnu)
Wget/1.12 (linux-gnu)
Wget/1.13.4 (linux-gnu)
Wget/1.18 (linux-gnu)
Wget/1.15 (linux-gnu)
Wget/1.19.1 (linux-gnu)
Wget/1.15 (darwin13.1.0)

snowhaze

SnowHaze Search/1.0 support@snowhaze.com

wordup

WordupInfoSearch/1.0
WordupinfoSearch/1.0

monitor

Buck/2.2; (+https://app.hypefactors.com/media-monitoring/about.html)
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36 http://notifyninja.com/monitoring

Already matched by existing filters:

Uptimebot.org - Free website monitoring
MBCrawler/1.0 (https://monitorbacklinks.com)

iframely

Iframely/0.8.5 (+http://iframely.com/;)
Iframely/1.2.2 (+http://iframely.com/;)
Iframely/1.2.5 (+http://iframely.com/;)
Iframely/1.2.7 (+http://iframely.com/;)
Iframely/1.0.4 (+http://iframely.com/;)
Iframely/1.2.7 (+https://iframely.com/;)

b-o-t

G-i-g-a-b-o-t
B-l-i-t-z-B-O-T/6.3 (Ubuntu 7.0; zh_TW;)
B-l-i-t-z-B-O-T/4.3 (AmigaOS 2.3; fr_LU;)
B-l-i-t-z-B-O-T/1.9 (Windows Vista 5.7; fr_CA;)

parser

inbound.li parser

Already matched by existing filters:

PocketParser/2.0 (+https://getpocket.com/pocketparser_ua)
RobotsTxtParser-VIPnytt/2.0 (+https://github.com/VIPnytt/RobotsTxtParser/blob/master/README.md)
Fever/1.39 (Feed Parser; http://feedafever.com; Allow like Gecko)
UniversalFeedParser/5.1.3 +https://code.google.com/p/feedparser/
SimplePie/1.3.1 (Feed Parser; http://simplepie.org; Allow like Gecko) Build/20121030175911
SimplePie/1.3.1 (Feed Parser; http://simplepie.org; Allow like Gecko) Build/20121030175911DNT: 1
metadataparser/1.1.0 (https://github.com/bloglovin/metadataparser)

stats

Mozilla/5.0 (compatible; WebDataStats/1.0 ;  https://webdatastats.com/policy.html)

Already matched by existing filters:

DomainStatsBot/1.0 (https://domainstats.com/pages/our-bot)

statistics

NetTrack Anonymous Web Statistics https://nettrack.info/support.php

cakephp

CakePHP

Bots that Matomo catches in the server (but not the log analyzer)

These seem to be caught successfully on the core server, even though the analyzer filters don't match these user agents. I think it might be worth adding those anyway, even if just to avoid sending useless data that will be filtered out on the other side - or in some cases to potentially catch new bots of a similar kind.

jobboersebot

Mozilla/5.0 (X11; U; Linux Core i7-4980HQ; de; rv:32.0; compatible; JobboerseBot; http://www.jobboerse.com/bot.htm) Gecko/20100101 Firefox/38.0

ltx71

ltx71 - (http://ltx71.com/)

facebookexternalhit

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0
facebookexternalhit/1.1
facebookexternalhit/1.1;kakaotalk-scrap/1.0; +https://devtalk.kakao.com/t/scrap/33984
facebookexternalhit/1.1;line-poker/1.0
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php) Facebot

bingpreview

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b

zgrab

Mozilla/5.0 zgrab/0.x
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13rl; +http://researchscan.comsys.rwth-aachen.de)
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13l; +http://researchscan.comsys.rwth-aachen.de)
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12sns; +http://researchscan.comsys.rwth-aachen.de)
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12ca; +http://researchscan.comsys.rwth-aachen.de)
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t12l; +http://researchscan.comsys.rwth-aachen.de)
Mozilla/5.0 zgrab/0.x (compatible; Researchscan/t13a; +http://researchscan.comsys.rwth-aachen.de)

bubing

BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
BUbiNG (+http://law.di.unimi.it/BUbiNG.html#wc)

qwantify

Mozilla/5.0 (compatible; Qwantify/2.4w; +https://www.qwant.com/)/2.4w
Qwantify/1.0
Mozilla/5.0 (compatible; Qwantify/vR; +https://www.qwant.com/)
Mozilla/5.0 (compatible; Qwantify/Mermoz/0.1; +https://www.qwant.com/; +https://www.github.com/QwantResearch/mermoz)
Mozilla/5.0 (compatible; Qwantify/2.4w; +https://www.qwant.com/)

skypeuripreview

Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5

archive

Mozilla/5.0 (compatible; archive.org_bot; Wayback Machine Live Record; +http://archive.org/details/archive.org_bot)
Mozilla/5.0 (compatible; special_archiver/3.1.1 +http://www.archive.org/details/archive.org_bot)
Mozilla/5.0 (compatible; archive.org_bot +http://www.archive.org/details/archive.org_bot)
\x22Bookmark Archiver\x22
Mozilla/5.0 (compatible; ia_archiver/1.0; +http://www.alexa.com/help/webmasters; crawler@alexa.com)
Mozilla/5.0 (compatible; special_archiver; Archive-It; +http://archive-it.org/files/site-owners-special.html)
Mozilla/5.0 (compatible; archive.org_bot; Archive-It; +http://archive-it.org/files/site-owners.html)
Mozilla/5.0 (compatible; archive.org_bot +http://archive.org/details/archive.org_bot)
IABot: Checking if link from Wikipedia is broken and needs removal - See https://meta.wikimedia.org/wiki/InternetArchiveBot/FAQ_for_sysadmins
ia_archiver (+http://www.alexa.com/site/help/webmasters; crawler@alexa.com)
@LinkArchiver twitter bot

googleimageproxy

Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0 (via ggpht.com GoogleImageProxy)

google web preview

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/41.0.2272.118 Safari/537.36

epicbot

Mozilla/5.0 (compatible; epicbot; +http://www.epictions.com/epicbot)

developers.google.com

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36 Google (+https://developers.google.com/+/web/snippet/)

chrome-lighthouse

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36(KHTML, like Gecko) Chrome/69.0.3464.0 Safari/537.36 Chrome-Lighthouse
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5 Build/MRA58N) AppleWebKit/537.36(KHTML, like Gecko) Chrome/69.0.3464.0 Mobile Safari/537.36 Chrome-Lighthouse

wordpress

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/69.0.3494.0 Safari/537.36 WordPress.com mShots
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/67.0.3372.0 Safari/537.36 WordPress.com mShots

WordPress/4.4.1; https://mjtsai.com/blog
WordPress/4.2.20; http://lynis.cn/blog
WordPress/4.9.8; http://www.alexcurylo.com
WordPress/4.9.5; http://www.alexcurylo.com
WordPress/4.9.6; https://blog.iconfactory.com
...etc.

daum

Mozilla/5.0 (compatible; Daum/4.1; +http://cs.daum.net/faq/15/4118.html?faqId=28966)
Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) Safari/538.1 Daum/4.1

google page speed

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/41.0.2272.118 Safari/537.36
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko; Google Page Speed Insights) Chrome/41.0.2272.118 Mobile Safari/537.36

naver.me

Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/spd)
Mozilla/5.0 (compatible; Yeti/1.1; +http://naver.me/bot)

newsbot

AppleNewsBot
Mozilla/5.0 (compatible; jpg-newsbot/2.0; +https://vipnytt.no/bots/)
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0 / GnowitNewsbot / Contact information at http://www.gnowit.com
Java 1.7 Apache HttpClient (Linux x86_64) / GnowitNewsbot / Contact information at http://www.gnowit.com

onalyticabot

OnalyticaBot

ips-agent

Mozilla/5.0 (compatible; ips-agent)

searchbot

SafeDNSBot (https://www.safedns.com/searchbot)

yacybot

yacybot (-global; amd64 Windows 2003 5.2; java 1.8.0_131; Europe/en) http://yacy.net/bot.html
yacybot (/global; amd64 Linux 4.4.0-112-generic; java 1.8.0_151; Europe/pl) http://yacy.net/bot.html
yacybot (/global; x86 Windows 10 10.0; java 1.8.0_121; America/en) http://yacy.net/bot.html
yacybot (/global; amd64 Windows 10 10.0; java 1.8.0_112; Europe/de) http://yacy.net/bot.html
yacybot (/global; amd64 Linux 4.19.0-3-MANJARO; java 1.8.0_192; Europe/en) http://yacy.net/bot.html
yacybot (freeworld/global; amd64 Linux 4.9.0-3-amd64; java 1.8.0_151; Europe/en) http://yacy.net/bot.html
yacybot (/global; ppc64 OS/400 V7R3M0; java 1.8.0; Europe/de) http://yacy.net/bot.html
yacybot (/global; arm Linux 4.9.80-v7+; java 9-Raspbian; Europe/de) http://yacy.net/bot.html
...etc.

dataprovider

Mozilla/5.0 (compatible; Dataprovider.com;)
Mozilla/5.0 (compatible; Dataprovider.com)

semrushbot

SEMrushBot

googledocs

Mozilla/5.0 (compatible; GoogleDocs; documents; +http://docs.google.com)
Mozilla/5.0 (compatible; GoogleDocs;  +http://docs.google.com; +Google-Document-Conversion)
Mozilla/5.0 (compatible; GoogleDocs; apps-presentations; +http://docs.google.com)

hackerfall

Mozilla/5.0 (compatible; HackerfallBot; +http://hackerfall.com/help/bot)

tigerbot

tigerbot

telegrambot

TelegramBot (like TwitterBot)

cloudflare

Mozilla/5.0 (compatible; CloudFlare-AlwaysOnline/1.0; +http://www.cloudflare.com/always-online) AppleWebKit/534.34
Mozilla/5.0 (compatible; Cloudflare-AMP/1.0; +https://amp.cloudflare.com/doc/fetcher.html) AppleWebKit/534.34

ssllabs

SSL Labs (https://www.ssllabs.com/about/assessment.html)

validator

W3C_Validator/1.3 http://validator.w3.org/services
Validator.nu/LV

verification

Mozilla/5.0 (compatible; Google-Site-Verification/1.0)

Copy link
Member
@sgiehl sgiehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure if it makes sense at all to maintain here another list of possible bots. We already have one in matomo-org/device-detector, which is used by Matomo to ignore all requests coming from bots. We should try to improve our list in device detector if something is missing and maybe remove the list here completely

'robot',
'rss',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are various mobile Apps that include RSS in their name but should not be considered as bot

@tsteur tsteur changed the base branch from master to 3.x-dev January 13, 2020 22:45
@sgiehl
Copy link
Member
sgiehl commented Jun 23, 2020

Closing this now. We do not want to maintain another big list of bots in log analytics. It's fine to have some basic detections, so a majority of bot requests won't be sent to Matomo at all, but all others should be detected by Matomo

@sgiehl sgiehl closed this Jun 23, 2020
@innocraft-automation innocraft-automation removed this from the Current sprint milestone Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0