Crawlery treningowe AI
Te boty crawlują strony, aby zbierać dane treningowe dla modeli AI. Wiele z nich respektuje robots.txt — dając Ci kontrolę nad tym, czy Twoje treści są wykorzystywane do treningu AI.
Crawlery treningowe AI pobierają strony internetowe, aby ich operatorzy mogli wykorzystać treści do treningu dużych modeli językowych i innych systemów AI. Zrozumienie, które boty są aktywne na Twojej stronie — i czy respektują Twoje reguły dostępu — to pierwszy krok w kontrolowaniu sposobu wykorzystania Twoich treści.
26 Crawler AI treningowy w naszym katalogu
- Amazon Kendra by Amazon
Amazon Kendra is an intelligent search service operated by Amazon, using natural language processing for accurate search results.
- Amazonbot by Amazon
Amazonbot is Amazon's web crawler used to improve products and services, such as training AI models.
- Anchor Browser by Anchor
The Anchor Browser is a web browser for AI agents, operated by Anchor, used for automating workflows and web interactions.
- AwarioSmartBot by Awario
AwarioSmartBot is a web crawler operated by Awario, used for collecting new and updated web data for Internet marketers.
- Big Sur AI by Big Sur AI
Big Sur AI Crawler, operated by Big Sur AI, crawls user websites for AI-infused experiences.
- Brandwatch by Brandwatch
Brandwatch's Magpie Crawler indexes web pages for social media monitoring and analysis.
- Ceramic TerraCotta by Ceramic
Ceramic TerraCotta is a web crawler operated by Ceramic, an AI training infrastructure company. It crawls websites to support Ceramic's AI model training optimization platform.
- ClaudeBot by Anthropic
ClaudeBot is Anthropic's web crawler that collects web content for training Claude AI models. It respects robots.txt directives and supports Crawl-delay.
- Cotoyogi by Research Organization of Information and Systems
Cotoyogi is a bot operated by the Research Organization of Information and Systems for AI training purposes.
- Echobot Bot by Echobox
Echobot is a web scraping bot operated by Echobox for AI training purposes, specifically for automating content distribution for digital publishers.
- Factset_spyderbot by Factset
Factset Spyderbot is a web scraping bot operated by Factset for delivering reliable financial data.
- Google NotebookLM by Google
Google NotebookLM bot operated by Google for AI training.
- Google-CloudVertexBot by Google
Google-CloudVertexBot is a crawler operated by Google for targeted AI training of site owners' own sites.
- GoogleOther by Google
GoogleOther is a generic crawler operated by Google for fetching publicly accessible content from sites, used for internal research and development.
- GPTBot by OpenAI
GPTBot is used to crawl content that may be used in training OpenAI's generative AI foundation models.
- ICC Crawler by NICT
ICC Crawler is a web crawler operated by NICT, collecting web pages for AI training.
- LINER Bot by Liner Bot
LINER Bot is a web crawler operated by Liner Bot, used for AI training by collecting data from the internet.
- Meta-ExternalAgent by Meta
Meta-ExternalAgent is a bot operated by Meta for AI training purposes, specifically for training AI models or improving products by indexing content directly.
- netEstate Imprint Crawler by netEstate
The NetEstate Imprint crawler crawls websites for public contact information.
- Novellum AI Crawl by Novellum
Novellum.ai is building out tools for building agents. This MCP tool will be used by agents to crawl sites.
- PetalBot by Huawei
PetalBot is a search engine crawler operated by Huawei, used for indexing websites and providing content recommendations in Petal Search engine, Huawei Assistant, and AI Search services.
- QualifiedBot by Qualified.com, Inc.
QualifiedBot is a crawler operated by Qualified.com, Inc. to power AI products by crawling customer websites.
- SemrushBot-OCOB by Semrush
SemrushBot-OCOB is a bot operated by Semrush for ai-training, specifically for Content Toolkit.
- SemrushBotSwa by SEMrush
SEMrushBotSwa is a bot operated by SEMrush for ai-training purposes, collecting data for the SEO Writing Assistant tool.
- ShapBot by Parallel
Not officially documented
- WARDBot by WEBSPARK
WARDBot is a monitoring bot operated by WEBSPARK that tracks URL status codes for users.