Skip to main content
Can AI see it

Know what AI sees. Measure what it's worth.

What is ICC Crawler?

Direct Answer: ICC Crawler is a web crawler operated by NICT, collecting web pages for AI training.

Operator: NICT Type: AI Training Crawler Purpose: AI training data collection AI Training

The ICC Crawler, operated by the Universal Communication Research Institute at the National Institute of Information and Communications Technology (NICT), automatically crawls the Internet to collect web pages. This activity is part of NICT's efforts to construct AI research foundations and develop core technologies, including multilingual communication and smart data utilization.

User-Agent Identification

The following user-agent strings identify ICC Crawler in your live traffic data:

  • ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html)

robots.txt Rules for ICC Crawler

Respects robots.txt: Yes

Use the following robots.txt rules to control ICC Crawler access:

# Block ICC Crawler
User-agent: ICC-Crawler/3.0
Disallow: /

# Allow ICC Crawler
User-agent: ICC-Crawler/3.0
Allow: /

Robots.txt is a directive, not a barrier

NICT states that ICC Crawler respects robots.txt. However, configuration mistakes, caching delays, and edge cases mean your directives may not always be followed as expected. Live traffic verification confirms whether ICC Crawler actually obeys your rules in practice.

Need continuous verification across 500+ bots? Can AI See It automates this.

Crawl Behavior

Frequency:Not Documented

Request Pattern:Not Documented

Official Documentation Quotes

"ICC-Crawler automatically crawls the Internet and collects web pages."

Crawl Activity Index

Relative crawl activity for ICC Crawler over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.

View recent activity data (last 7 days)
Date Activity Index
Mar 26, 2026 88.0
Mar 27, 2026 82.7
Mar 28, 2026 83.1
Mar 29, 2026 81.8
Mar 30, 2026 87.3
Mar 31, 2026 90.2
Apr 1, 2026 88.9

Source: Cloudflare Radar

Why track ICC Crawler traffic?

Measure what NICT gives back. ICC Crawler takes your content for AI training — but does NICT send any traffic in return through other products? Track whether the trade-off is worth it before deciding to block.

Understand what content is being collected for AI training. ICC Crawler crawls your site to gather data that may train AI models. Tracking its activity reveals which pages are selected — and which are skipped.

Make an informed block-or-allow decision. Blocking ICC Crawler prevents your content from being used in future model training. But first, measure the volume: how many pages does it fetch, how often, and does NICT send any referral traffic through other products?

Detect content harvesting patterns. If ICC Crawler is systematically crawling your highest-value content (product pages, proprietary research, premium articles), you may want to restrict access using robots.txt or server-side rules.

What does ICC Crawler crawling actually cost you?

AI training bots like ICC Crawler collect your content to improve future AI models. Unlike AI search bots, there's no direct referral pipeline — ICC Crawler doesn't cite sources or send traffic back to your site.

What you give

  • Server resources for every crawl request
  • Your content, expertise, and original research
  • Data that improves a competing AI product

What you get back

  • No direct referral traffic from ICC Crawler
  • No attribution in AI model outputs
  • No revenue share from model usage

This doesn't automatically mean you should block ICC Crawler. But you need to measure the real cost before deciding. NICT may send traffic through other products (NICT's AI products) — blocking the training bot might not affect referrals at all, or it might. Only log data tells you.

What Can AI See It measures for AI training bots

Crawl volume

How many pages ICC Crawler fetches from your site

Content targeting

Which pages and sections ICC Crawler prioritizes

Cross-platform CRR

Do NICT's OTHER products send you traffic?

Compliance check

Does ICC Crawler actually respect your robots.txt?

How is this different from prompt testing tools? Prompt testing checks if AI mentions your brand in simulated queries. Can AI See It measures what actually happens: real crawls, real referrals, real conversions — from your live traffic data.

Read: Why live traffic monitoring beats prompt testing →

Log Verification

To verify ICC Crawler traffic in your live traffic data:

  1. Search access logs for the user-agent strings listed above
  2. Check if the IP addresses match documented ranges (if provided by NICT)
  3. Verify the crawl pattern matches documented behavior
  4. Use reverse DNS lookup for additional verification if available

Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.

Undocumented Information

The following information is not officially documented for ICC Crawler:

  • crawl frequency
  • request pattern
  • IP verification method
  • JavaScript rendering

Measure your Crawl-to-Referral Ratio for ICC Crawler

See how much traffic NICT actually sends back to your site relative to how much content ICC Crawler takes.

  • Connect ICC Crawler crawls in your logs with referral sessions in analytics
  • Calculate your CRR — the metric prompt testing tools can't provide
  • Make data-driven block/allow decisions for every AI bot

Measure business impact from ICC Crawler

The question isn't just whether to block ICC Crawler — it's what you lose or gain from its crawling activity.

  • Crawl volume: how many pages ICC Crawler collects from your site
  • Content value: which content categories are targeted most
  • Cross-platform CRR: does NICT send traffic through other products?
  • Referral tracking: ICC Crawler takes — measure what NICT gives back. Track actual visits arriving from NICT's products to your site.
Audit ICC Crawler crawl activity on your site →

Based on your live traffic data and analytics — not synthetic prompt tests.

Official Documentation

View Official ICC Crawler Documentation →

Information sourced from official documentation. Content generated with AI assistance.