Which user-agent strings identify Arquivo Web Crawler?

The following user-agent strings identify Arquivo Web Crawler: Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling).

Does Arquivo Web Crawler respect robots.txt rules?

Yes, Arquivo Web Crawler respects robots.txt directives according to official documentation.

How can I verify Arquivo Web Crawler traffic with live traffic data?

You can verify Arquivo Web Crawler requests by checking your server access logs for the documented user-agent strings. For accurate verification, correlate user-agent patterns with IP ranges or verification methods provided by Arquivo.

What is Arquivo Web Crawler?

Direct Answer: Arquivo Web Crawler is a web archiving bot operated by Arquivo, capturing the Portuguese web.

Operator: Arquivo Type: Web Archiver Purpose: Web archiving and preservation

The Arquivo Web Crawler is a web archiving bot designed to capture and preserve the Portuguese web. It is operated by Arquivo and utilizes Heritrix, a web archiving software, version 3.4.0-20200304. The bot's primary function is to systematically crawl and archive web content.

User-Agent Identification

The following user-agent strings identify Arquivo Web Crawler in your live traffic data:

Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)

robots.txt Rules for Arquivo Web Crawler

Respects robots.txt: Yes

Use the following robots.txt rules to control Arquivo Web Crawler access:

# Block Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Disallow: /

# Allow Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Allow: /

Robots.txt is a directive, not a barrier

Arquivo states that Arquivo Web Crawler respects robots.txt. However, configuration mistakes, caching delays, and edge cases mean your directives may not always be followed as expected. Live traffic verification confirms whether Arquivo Web Crawler actually obeys your rules in practice.

Need continuous verification across 500+ bots? Can AI See It automates this.

Crawl Behavior

Frequency:Not Documented

Request Pattern:Not Documented

Official Documentation Quotes

"Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)"
Source:Official Documentation

Crawl Activity Index

Relative crawl activity for Arquivo Web Crawler over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.

View recent activity data (last 7 days)

Date	Activity Index
Mar 26, 2026	88.0
Mar 27, 2026	82.7
Mar 28, 2026	83.1
Mar 29, 2026	81.8
Mar 30, 2026	87.3
Mar 31, 2026	90.2
Apr 1, 2026	88.9

Source: Cloudflare Radar

Why track Arquivo Web Crawler traffic?

Track what's being preserved. Arquivo Web Crawler archives your content for long-term preservation. Monitoring shows which pages are captured and how frequently.

Control what gets archived. If certain pages contain outdated pricing or content you'd prefer not permanently accessible, tracking Arquivo Web Crawler helps you apply controls.

Log Verification

To verify Arquivo Web Crawler traffic in your live traffic data:

Search access logs for the user-agent strings listed above
Check if the IP addresses match documented ranges (if provided by Arquivo)
Verify the crawl pattern matches documented behavior
Use reverse DNS lookup for additional verification if available

Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.

Undocumented Information

The following information is not officially documented for Arquivo Web Crawler:

crawl frequency
request pattern
IP verification method
JavaScript rendering details

Official Documentation

View Official Arquivo Web Crawler Documentation →

Information sourced from official documentation. Content generated with AI assistance.