Skip to main content
Can AI see it

Know what AI sees. Measure what it's worth.

← Back to bot catalogue

What is Arquivo Web Crawler?

Direct Answer: Arquivo Web Crawler is a web archiving bot operated by Arquivo, capturing the Portuguese web.

Operator: Arquivo Type: Web Archiver Purpose: Web archiving and preservation

The Arquivo Web Crawler is a web archiving bot designed to capture and preserve the Portuguese web. It is operated by Arquivo and utilizes Heritrix, a web archiving software, version 3.4.0-20200304. The bot's primary function is to systematically crawl and archive web content.

User-Agent Identification

The following user-agent strings identify Arquivo Web Crawler in your live traffic data:

  • Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)

robots.txt Rules for Arquivo Web Crawler

Respects robots.txt: Yes

Use the following robots.txt rules to control Arquivo Web Crawler access:

# Block Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Disallow: /

# Allow Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Allow: /

Robots.txt is a directive, not a barrier

Arquivo states that Arquivo Web Crawler respects robots.txt. However, configuration mistakes, caching delays, and edge cases mean your directives may not always be followed as expected. Live traffic verification confirms whether Arquivo Web Crawler actually obeys your rules in practice.

Need continuous verification across 500+ bots? Can AI See It automates this.

Crawl Behavior

Frequency:Not Documented

Request Pattern:Not Documented

Official Documentation Quotes

"Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)"

Source:Official Documentation

Crawl Activity Index

Relative crawl activity for Arquivo Web Crawler over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.

View recent activity data (last 7 days)
Date Activity Index
May 25, 2026 49.2
May 26, 2026 68.8
May 27, 2026 90.4
May 28, 2026 87.5
May 29, 2026 67.5
May 30, 2026 57.1
May 31, 2026 59.7

Source: Cloudflare Radar

Why track Arquivo Web Crawler traffic?

Track what's being preserved. Arquivo Web Crawler archives your content for long-term preservation. Monitoring shows which pages are captured and how frequently.

Control what gets archived. If certain pages contain outdated pricing or content you'd prefer not permanently accessible, tracking Arquivo Web Crawler helps you apply controls.

Log Verification

To verify Arquivo Web Crawler traffic in your live traffic data:

  1. Search access logs for the user-agent strings listed above
  2. Check if the IP addresses match documented ranges (if provided by Arquivo)
  3. Verify the crawl pattern matches documented behavior
  4. Use reverse DNS lookup for additional verification if available

Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.

Undocumented Information

The following information is not officially documented for Arquivo Web Crawler:

  • crawl frequency
  • request pattern
  • IP verification method
  • JavaScript rendering details

Official Documentation

View Official Arquivo Web Crawler Documentation →

Information sourced from official documentation. Content generated with AI assistance.