added various bots and services to excluded agents #239
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm currently in the process of setting up Matomo on my server (in log analytics mode). I saw in the visitor log that some entries that didn't look human were sneaking through, so I started analysing the logs I have from the last 12 months in order to filter out some more bots that the script currently misses. I think I managed to find some good keywords that catch quite a lot of bot requests and don't seem to have any false positives.
Here's the list of keywords I've added, and for each keyword the specific unique user agents it catches (from a sample of ~2.5M lines).
I've also removed
adsbot-google
from the existing list, since it's already covered by thebot-
pattern.Bots that Matomo currently doesn't catch
These are currently tracked as normal visits, I've tested this with
--regex-group-to-visit-cvar="user_agent=UserAgent"
and then checked in the custom variables table that they do appear in the results:favicon
Already matched by existing filters:
thumb
Already matched by existing filters:
fetch
Already matched by existing filters:
backlink
Already matched by existing filters:
hatena
Already matched by existing filters:
python
request
phantomjs
Already matched by existing filters:
okhttp
headless
Already matched by existing filters:
http-client
Already matched by existing filters:
http_client
httpclient
Already matched by existing filters:
appengine
evergreen
netnewswire
rss
Already matched by existing filters:
ruby
Already matched by existing filters:
node (or could be limited to
node-
,node.
)http.rb
scanner
Already matched by existing filters:
datanyze
check
Already matched by existing filters:
wkhtmlto
sweeper
scrap
Already matched by existing filters:
embedly
Already matched by existing filters:
embed php
tracemyfile
Already matched by existing filters:
wget
snowhaze
wordup
monitor
Already matched by existing filters:
iframely
b-o-t
parser
Already matched by existing filters:
stats
Already matched by existing filters:
statistics
cakephp
Bots that Matomo catches in the server (but not the log analyzer)
These seem to be caught successfully on the core server, even though the analyzer filters don't match these user agents. I think it might be worth adding those anyway, even if just to avoid sending useless data that will be filtered out on the other side - or in some cases to potentially catch new bots of a similar kind.
jobboersebot
ltx71
facebookexternalhit
bingpreview
zgrab
bubing
qwantify
skypeuripreview
archive
googleimageproxy
google web preview
epicbot
developers.google.com
chrome-lighthouse
wordpress
daum
google page speed
naver.me
newsbot
onalyticabot
ips-agent
searchbot
yacybot
dataprovider
semrushbot
googledocs
hackerfall
tigerbot
telegrambot
cloudflare
ssllabs
validator
verification