KI-Trainings-Crawler
Diese Bots crawlen Websites, um Trainingsdaten für KI-Modelle zu sammeln. Viele respektieren robots.txt — so behalten Sie die Kontrolle darüber, ob Ihre Inhalte für KI-Training verwendet werden.
KI-Trainings-Crawler laden Webseiten herunter, damit ihre Betreiber die Inhalte zum Training großer Sprachmodelle und anderer KI-Systeme nutzen können. Zu verstehen, welche Bots auf Ihrer Website aktiv sind — und ob sie Ihre Zugriffsregeln respektieren — ist der erste Schritt zur Kontrolle über die Nutzung Ihrer Inhalte.
26 KI-Trainings-Crawler in unserem Verzeichnis
- Amazon Kendra by Amazon
Amazon Kendra is an intelligent search service operated by Amazon, using natural language processing for accurate search results.
- Amazonbot by Amazon
Amazonbot is Amazon's web crawler used to improve products and services, such as training AI models.
- Anchor Browser by Anchor
The Anchor Browser is a web browser for AI agents, operated by Anchor, used for automating workflows and web interactions.
- AwarioSmartBot by Awario
AwarioSmartBot is a web crawler operated by Awario, used for collecting new and updated web data for Internet marketers.
- Big Sur AI by Big Sur AI
Big Sur AI Crawler, operated by Big Sur AI, crawls user websites for AI-infused experiences.
- Brandwatch by Brandwatch
Brandwatch's Magpie Crawler indexes web pages for social media monitoring and analysis.
- Ceramic TerraCotta by Ceramic
Ceramic TerraCotta is a web crawler operated by Ceramic, an AI training infrastructure company. It crawls websites to support Ceramic's AI model training optimization platform.
- ClaudeBot by Anthropic
ClaudeBot is Anthropic's web crawler that collects web content for training Claude AI models. It respects robots.txt directives and supports Crawl-delay.
- Cotoyogi by Research Organization of Information and Systems
Cotoyogi is a bot operated by the Research Organization of Information and Systems for AI training purposes.
- Echobot Bot by Echobox
Echobot is a web scraping bot operated by Echobox for AI training purposes, specifically for automating content distribution for digital publishers.
- Factset_spyderbot by Factset
Factset Spyderbot is a web scraping bot operated by Factset for delivering reliable financial data.
- Google NotebookLM by Google
Google NotebookLM bot operated by Google for AI training.
- Google-CloudVertexBot by Google
Google-CloudVertexBot is a crawler operated by Google for targeted AI training of site owners' own sites.
- GoogleOther by Google
GoogleOther is a generic crawler operated by Google for fetching publicly accessible content from sites, used for internal research and development.
- GPTBot by OpenAI
GPTBot is used to crawl content that may be used in training OpenAI's generative AI foundation models.
- ICC Crawler by NICT
ICC Crawler is a web crawler operated by NICT, collecting web pages for AI training.
- LINER Bot by Liner Bot
LINER Bot is a web crawler operated by Liner Bot, used for AI training by collecting data from the internet.
- Meta-ExternalAgent by Meta
Meta-ExternalAgent is a bot operated by Meta for AI training purposes, specifically for training AI models or improving products by indexing content directly.
- netEstate Imprint Crawler by netEstate
The NetEstate Imprint crawler crawls websites for public contact information.
- Novellum AI Crawl by Novellum
Novellum.ai is building out tools for building agents. This MCP tool will be used by agents to crawl sites.
- PetalBot by Huawei
PetalBot is a search engine crawler operated by Huawei, used for indexing websites and providing content recommendations in Petal Search engine, Huawei Assistant, and AI Search services.
- QualifiedBot by Qualified.com, Inc.
QualifiedBot is a crawler operated by Qualified.com, Inc. to power AI products by crawling customer websites.
- SemrushBot-OCOB by Semrush
SemrushBot-OCOB is a bot operated by Semrush for ai-training, specifically for Content Toolkit.
- SemrushBotSwa by SEMrush
SEMrushBotSwa is a bot operated by SEMrush for ai-training purposes, collecting data for the SEO Writing Assistant tool.
- ShapBot by Parallel
Not officially documented
- WARDBot by WEBSPARK
WARDBot is a monitoring bot operated by WEBSPARK that tracks URL status codes for users.