Bot & Crawler List – Download IPv4 Addresses and User Agents
Websites have long been scanned by automated systems — for search engine indexing, content extraction (scraping), vulnerability detection, or increasingly, the collection of training data for AI models.
In many cases, this occurs without the operators’ consent and in violation of technical restrictions such as the robots.txt file or applicable legal regulations. Service companies and larger corporations naturally make use of data and content when and how it suits them, usually without transparency or responsibility.
Such activity not only generates unnecessary server load but can also pose security risks, distort analytics and statistical data, and negatively impact a site's visibility in search engines.
Take advantage of:
Structured IP and user agent lists available for download
Information on origin, company, identification and activity
Clear tables for fast filtering and blocking
Detailed Bot Catalog: Company Logos, Identifications, User Agent Details and IP Information
The "Artificial Intelligence" category lists bots and crawlers that automatically collect website content. These systems are used to train AI models and frequently operate outside legal or ethical boundaries.
Logo
Company
Webcrawler / Bot
User-Agent(s)
Lines
IPv4
Lines
Amazonbot
3
5
Amazonbot is Amazon's official web crawler and is used to capture publicly accessible content for services such as Alexa or to improve search results.
Applebot
3
19
Apple's crawler, collects public web data for Siri/Spotlight and trains Apple AI models
Bytespider
TikTokSpider
4
5121
Webcrawler from ByteDance (TikTok); collects training data for large language models (e.g. Doubao)
CCBot
3
6
Crawler of the Common Crawl (open data web archive); collects open web data for research and ML projects
VelenPublicWebCrawler
3
8
The crawler analyzes millions of publicly accessible websites to collect structured business data.
YaK
3
5
Linkfluence's web crawler automatically collects online data, primarily from social media and websites, for use in AI-supported analysis platforms such as “Radarly”. The data flows into applications for sentiment analysis, image recognition, trend forecasting and target group analysis and supports companies in brand and consumer research.
AdIdxBot
BingBot
BingPreview
6
515
Microsoft uses several specialized crawlers: BingBot is the main web crawler of the Bing search engine and indexes websites for the search. AdIdxBot checks the target pages of advertisements for quality control. BingPreview generates page previews for display in Bing or related Microsoft services.
Dotbot
4
6
Moz' SEO crawler (“DotBot”); collects link and website data for Moz tools (e.g. Link Explorer)
Neevabot
3
4
Crawler of the (now discontinued) Neeva search; “downloads the Internet” to build its own search index. The company is now part of Snowflake Inc.
ChatGPT-User
3
449
The ChatGPT-User user agent is used when users of ChatGPT or custom GPTs actively call up websites via integrated functions or GPT actions. It is used exclusively for user-controlled access and is not used for automatic crawling or to collect training data for AI.
Search engine bots automatically scan the web to index website content. They are essential for web visibility but can strain server resources under high load.
Logo
Company
Webcrawler / Bot
User-Agent(s)
Lines
IPv4
Lines
RainBot
3
5
RainBot is a web crawler that identifies itself exclusively with the RainBot user agent and is classified as a web scraper. The operator is not publicly known; observed activities originate from IP ranges of Amazon Web Services (AWS).
AhrefsBot
3
304
SEO crawler from Ahrefs, searches the web for backlinks and SEO database
PetalBot
5
227
Crawler of the Huawei search engine Petal; indexes websites for Petal Search and Huawei Assistant
Barkrowler
3
10
Crawler from Babbar.tech; builds a web graph for SEO tools (indexes links/pages)
Cốc Cốc
4
112
Vietnamese search engine crawler; indexes web content for local searches
Initdex-Bot
3
6
Less well known; bot of Cyberserver Ltd., exact purpose not certain
DataForSeoBot
3
6
SEO data crawler (DataForSEO); collects rankings, keywords and competition data for SEO tools
DuckDuckGo-Favicons-Bot
3
14
Bot from DuckDuckGo to retrieve website favicons (small logo icons) for search result display
CheckMarkNetwork
3
3
Web crawler from CheckMark Network; collects data for search indexing, competition and market analyses
Googlebot
Google Favicon
Googlebot-Image
9
170
Google uses various crawlers to index web content: Googlebot is the central web crawler that indexes websites for Google Search and services such as Discover. In addition, the now discontinued Google Favicon bot was used to load favicons, as well as Googlebot-Image, which indexes images for Google Images and image-related search services.
InfoTigerBot
3
3
InfoTigerBot is the web crawler of the search engine infotiger, which automatically visits websites in German and English and prepares them for indexing. Texts are pre-processed, tokenized, cleaned of stop words and analysed for an effective full-text search
MJ12bot
3
59
SEO crawler from Majestic (Majestic-12); searches the web for links to build a comprehensive backlink index
MojeekBot
3
3
Crawler of the independent search engine Mojeek; searches and indexes websites for the Mojeek search results
Qwantify
5
54
Web crawler of the privacy-oriented search engine Qwant; indexes the web for Qwant's search index
fluid
3
3
“Fluid” bot from leak.info; highly developed scraper that imitates human surfing behavior in order to automatically access data (e.g. prices, content)
SemrushBot
SemrushBot-BA
5
51
Special Semrush crawler for backlink analyses (backlink audit); searches websites for inbound links
SeznamBot
5
54
Crawler of the Czech search engine Seznam; indexes websites for Seznam.cz search results
SeekportBot
Seekport Crawler
6
521
The SeekportBot is a web crawler operated by SISTRIX GmbH. SISTRIX is a German company based in Bonn that specializes in SEO analysis tools. SISTRIX has been operating the domain seekport.de since December 2014 and uses the SeekportBot to index websites for an independent search engine.
SurdotlyBot
3
3
Crawler from Sur.ly; searches and checks outgoing links/pages
AwarioSmartBot
4
8
AwarioSmartBot is a web crawler from the social media monitoring service Awario that searches online content such as websites, blogs and forums for brand-related mentions. The service is operated by Techfusion Ltd. based in Nicosia, Cyprus.
BLEXBot
3
4
BLEXBot is a web crawler that was originally developed by WebMeUp to index website content for SEO analysis and data aggregation. Currently, BLEXBot is operated by Techfusion Ltd, a company based in Nicosia, Cyprus, which is also behind the SEO software SEO PowerSuite.
YandexBot
YandexFavicons
YandexImages
6
579
Yandex operates several specialized crawlers: the main crawler indexes websites for the general search results, another captures favicons for display in the search results, and a third crawls specific images for the Yandex image search.
ZoominfoBot
3
30
Crawler from ZoomInfo; scans millions of company websites, press releases etc. to collect business and personal data for sales/marketing purposes
Crawlers automatically scan websites to collect data. While useful, they may heavily burden server resources.
Logo
Company
Webcrawler / Bot
User-Agent(s)
Lines
IPv4
Lines
AdsTxtCrawlerTP
3
4
AdsTxtCrawlerTP is a web crawler that specifically retrieves ads.txt files from websites using the AdsTxtCrawlerTP/1.2 user agent. The operator of the service is not publicly known.
MegaIndex.ru/2.0
3
3
SEO crawler (ALTWeb/MegaIndex); scans websites for keywords, backlinks and structure for SEO analyses
ev-crawler
e.ventures Investment Crawler
4
8
Crawler from e.ventures (Headline); indexes and analyzes websites
EBID AG Crawler
3
3
Crawler from EBID Service AG; collects company data from websites for a company directory
Linespider
3
5
Crawler from LINE (Japan), which indexes websites for the LINE search services
Netestate Ne Crawler
3
4
German crawler (netEstate GmbH) for data collection (market research, SEO)
Birdcrawlerbot
3
3
Web crawler for indexing website content for SEO and analysis
Miscellaneous refers to services that have not yet been assigned to a specific category.
Logo
Company
Webcrawler / Bot
User-Agent(s)
Lines
IPv4
Lines
Facebook External User Agent
3
10
The “Facebook External User Agent” is used when users click on links in the Facebook or Instagram app to load content in the in-app browser. According to Meta, it is used to generate link previews, cache content and analyze websites for display in social feeds and to comply with content guidelines.
Github Camo
4
57
GitHub's image proxy service; retrieves embedded images via SSL to protect user IPs