What is Ceramic TerraCotta?
Direct Answer: Ceramic TerraCotta is a web crawler operated by Ceramic, an AI training infrastructure company. It crawls websites to support Ceramic's AI model training optimization platform.
Ceramic TerraCotta is a web crawler operated by Ceramic (ceramic.ai), a company focused on optimizing large-scale AI model training. The crawler systematically indexes web content and identifies itself as 'TerraCotta' in server logs. Ceramic states the crawler is part of an upcoming product that aims to 'drive valuable traffic' to websites. The bot respects robots.txt directives and can be controlled via standard User-agent: TerraCotta rules. Ceramic was founded by Anna Patterson, who has over 20 years of AI experience including roles at Google and Stanford.
User-Agent Identification
The following user-agent strings identify Ceramic TerraCotta in your live traffic data:
TerraCotta
robots.txt Rules for Ceramic TerraCotta
Respects robots.txt: Yes
Use the following robots.txt rules to control Ceramic TerraCotta access:
# Block Ceramic TerraCotta
User-agent: TerraCotta
Disallow: /
# Allow Ceramic TerraCotta
User-agent: TerraCotta
Allow: / Robots.txt is a directive, not a barrier
Ceramic states that Ceramic TerraCotta respects robots.txt. However, configuration mistakes, caching delays, and edge cases mean your directives may not always be followed as expected. Live traffic verification confirms whether Ceramic TerraCotta actually obeys your rules in practice.
Need continuous verification across 500+ bots? Can AI See It automates this.
Crawl Behavior
Frequency:Not Documented
Request Pattern:Not Documented
Official Documentation Quotes
"In our upcoming product, we aim to drive valuable traffic to your websites—stay tuned for more details!"
"I'm a responsible web crawler that respects robots.txt, the standard mechanism for webmasters to control which parts of a site bots can access."
Why track Ceramic TerraCotta traffic?
Measure what Ceramic gives back. Ceramic TerraCotta takes your content for AI training — but does Ceramic send any traffic in return through other products? Track whether the trade-off is worth it before deciding to block.
Understand what content is being collected for AI training. Ceramic TerraCotta crawls your site to gather data that may train AI models. Tracking its activity reveals which pages are selected — and which are skipped.
Make an informed block-or-allow decision. Blocking Ceramic TerraCotta prevents your content from being used in future model training. But first, measure the volume: how many pages does it fetch, how often, and does Ceramic send any referral traffic through other products?
Detect content harvesting patterns. If Ceramic TerraCotta is systematically crawling your highest-value content (product pages, proprietary research, premium articles), you may want to restrict access using robots.txt or server-side rules.
What does Ceramic TerraCotta crawling actually cost you?
AI training bots like Ceramic TerraCotta collect your content to improve future AI models. Unlike AI search bots, there's no direct referral pipeline — Ceramic TerraCotta doesn't cite sources or send traffic back to your site.
What you give
- Server resources for every crawl request
- Your content, expertise, and original research
- Data that improves a competing AI product
What you get back
- No direct referral traffic from Ceramic TerraCotta
- No attribution in AI model outputs
- No revenue share from model usage
This doesn't automatically mean you should block Ceramic TerraCotta. But you need to measure the real cost before deciding. Ceramic may send traffic through other products (Ceramic's AI products) — blocking the training bot might not affect referrals at all, or it might. Only log data tells you.
What Can AI See It measures for AI training bots
How many pages Ceramic TerraCotta fetches from your site
Which pages and sections Ceramic TerraCotta prioritizes
Do Ceramic's OTHER products send you traffic?
Does Ceramic TerraCotta actually respect your robots.txt?
How is this different from prompt testing tools? Prompt testing checks if AI mentions your brand in simulated queries. Can AI See It measures what actually happens: real crawls, real referrals, real conversions — from your live traffic data.
Read: Why live traffic monitoring beats prompt testing →Log Verification
To verify Ceramic TerraCotta traffic in your live traffic data:
- Search access logs for the user-agent strings listed above
- Check if the IP addresses match documented ranges (if provided by Ceramic)
- Verify the crawl pattern matches documented behavior
- Use reverse DNS lookup for additional verification if available
Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.
Undocumented Information
The following information is not officially documented for Ceramic TerraCotta:
- crawl frequency
- request pattern
- full user-agent string
- IP ranges
- JavaScript rendering
Measure your Crawl-to-Referral Ratio for Ceramic TerraCotta
See how much traffic Ceramic actually sends back to your site relative to how much content Ceramic TerraCotta takes.
- Connect Ceramic TerraCotta crawls in your logs with referral sessions in analytics
- Calculate your CRR — the metric prompt testing tools can't provide
- Make data-driven block/allow decisions for every AI bot
Measure business impact from Ceramic TerraCotta
The question isn't just whether to block Ceramic TerraCotta — it's what you lose or gain from its crawling activity.
- Crawl volume: how many pages Ceramic TerraCotta collects from your site
- Content value: which content categories are targeted most
- Cross-platform CRR: does Ceramic send traffic through other products?
- Referral tracking: Ceramic TerraCotta takes — measure what Ceramic gives back. Track actual visits arriving from Ceramic's products to your site.
Based on your live traffic data and analytics — not synthetic prompt tests.
Official Documentation
View Official Ceramic TerraCotta Documentation →
Information sourced from official documentation. Content generated with AI assistance.