new crawler bots overloading the site :-(
Author:  pnm  [ Tue Sep 12, 2023 1:21 pm ]
Post subject:  new crawler bots overloading the site :-(

A while after this issue: viewtopic.php?f=18&t=1475,

I'm finding that "bing" is _still_ near the top amongst the bots crawling the site [thumbdn]

There is this "crawl delay" parameter I can use to lower the rate, but only up to a max of 20 seconds (it is now set at 10).

A newcomer is the "GPTbot" crawler (from OpenAI, of ChatGPT fame), with ~ 74000 visits in a week [mad]

The current top 10, approximate number of visits in a week:

1. GPTbot, 74000
2. Ahrefs, 56000
3. Bing, 48000
4. dotbot (mozilla), 29000
5. Yandex, 18000
6. comscore, Bytespider (tiktok), 5000
7. Grapeshotcrawler (Oracle), 4000
8. Googlebot, Amazon bot, 3000
9. Yahoo Japan, 2500
10. peer39, 2000

I'll most likely block some of these to reduce server load.

edit: I blocked a number of them (GPTbot, Ahrefs, dotbot, Yandex, comscore, Bytespider, Grapeshot), so this should help [smile]

With Yandex gone, the site can at some point no longer be found from Russia.

edit 2: the actual Google crawl rate is closer to ~ 25000 (judging from the stats in their "search console").
You can manually set a lower crawl rate there (although the page with that setting is hard to find).
The lower rate automatically expires in 3 months (?!) [huh]

