Skip to main content
Can AI see it

Know what AI sees. Measure what it's worth.

What is Arquivo Web Crawler?

Direct Answer: Arquivo Web Crawler is a web archiving bot operated by Arquivo, capturing the Portuguese web.

Operator: Arquivo Type: Web Archiver Purpose: Web archiving and preservation

The Arquivo Web Crawler is a web archiving bot designed to capture and preserve the Portuguese web. It is operated by Arquivo and utilizes Heritrix, a web archiving software, version 3.4.0-20200304. The bot's primary function is to systematically crawl and archive web content.

User-Agent Identification

The following user-agent strings identify Arquivo Web Crawler in your live traffic data:

  • Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)

robots.txt Rules for Arquivo Web Crawler

Respects robots.txt: Yes

Use the following robots.txt rules to control Arquivo Web Crawler access:

# Block Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Disallow: /

# Allow Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Allow: /

Robots.txt is a directive, not a barrier

Arquivo states that Arquivo Web Crawler respects robots.txt. However, configuration mistakes, caching delays, and edge cases mean your directives may not always be followed as expected. Live traffic verification confirms whether Arquivo Web Crawler actually obeys your rules in practice.

Need continuous verification across 500+ bots? Can AI See It automates this.

Crawl Behavior

Frequency:Not Documented

Request Pattern:Not Documented

Official Documentation Quotes

"Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)"

Crawl Activity Index

Relative crawl activity for Arquivo Web Crawler over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.

View recent activity data (last 7 days)
Date Activity Index
Mar 26, 2026 88.0
Mar 27, 2026 82.7
Mar 28, 2026 83.1
Mar 29, 2026 81.8
Mar 30, 2026 87.3
Mar 31, 2026 90.2
Apr 1, 2026 88.9

Source: Cloudflare Radar

Why track Arquivo Web Crawler traffic?

Track what's being preserved. Arquivo Web Crawler archives your content for long-term preservation. Monitoring shows which pages are captured and how frequently.

Control what gets archived. If certain pages contain outdated pricing or content you'd prefer not permanently accessible, tracking Arquivo Web Crawler helps you apply controls.

Log Verification

To verify Arquivo Web Crawler traffic in your live traffic data:

  1. Search access logs for the user-agent strings listed above
  2. Check if the IP addresses match documented ranges (if provided by Arquivo)
  3. Verify the crawl pattern matches documented behavior
  4. Use reverse DNS lookup for additional verification if available

Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.

Undocumented Information

The following information is not officially documented for Arquivo Web Crawler:

  • crawl frequency
  • request pattern
  • IP verification method
  • JavaScript rendering details

Official Documentation

View Official Arquivo Web Crawler Documentation →

Information sourced from official documentation. Content generated with AI assistance.