What is Arquivo Web Crawler?
Direct Answer: Arquivo Web Crawler is a web archiving bot operated by Arquivo, capturing the Portuguese web.
The Arquivo Web Crawler is a web archiving bot designed to capture and preserve the Portuguese web. It is operated by Arquivo and utilizes Heritrix, a web archiving software, version 3.4.0-20200304. The bot's primary function is to systematically crawl and archive web content.
User-Agent Identification
The following user-agent strings identify Arquivo Web Crawler in your live traffic data:
Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)
robots.txt Rules for Arquivo Web Crawler
Respects robots.txt: Yes
Use the following robots.txt rules to control Arquivo Web Crawler access:
# Block Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Disallow: /
# Allow Arquivo Web Crawler
User-agent: Arquivo-web-crawler
Allow: / Robots.txt is a directive, not a barrier
Arquivo states that Arquivo Web Crawler respects robots.txt. However, configuration mistakes, caching delays, and edge cases mean your directives may not always be followed as expected. Live traffic verification confirms whether Arquivo Web Crawler actually obeys your rules in practice.
Need continuous verification across 500+ bots? Can AI See It automates this.
Crawl Behavior
Frequency:Not Documented
Request Pattern:Not Documented
Official Documentation Quotes
"Arquivo-web-crawler (compatible; heritrix/3.4.0-20200304 +https://arquivo.pt/faq-crawling)"
Crawl Activity Index
Relative crawl activity for Arquivo Web Crawler over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.
View recent activity data (last 7 days)
| Date | Activity Index |
|---|---|
| Mar 26, 2026 | 88.0 |
| Mar 27, 2026 | 82.7 |
| Mar 28, 2026 | 83.1 |
| Mar 29, 2026 | 81.8 |
| Mar 30, 2026 | 87.3 |
| Mar 31, 2026 | 90.2 |
| Apr 1, 2026 | 88.9 |
Source: Cloudflare Radar
Why track Arquivo Web Crawler traffic?
Track what's being preserved. Arquivo Web Crawler archives your content for long-term preservation. Monitoring shows which pages are captured and how frequently.
Control what gets archived. If certain pages contain outdated pricing or content you'd prefer not permanently accessible, tracking Arquivo Web Crawler helps you apply controls.
Log Verification
To verify Arquivo Web Crawler traffic in your live traffic data:
- Search access logs for the user-agent strings listed above
- Check if the IP addresses match documented ranges (if provided by Arquivo)
- Verify the crawl pattern matches documented behavior
- Use reverse DNS lookup for additional verification if available
Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.
Undocumented Information
The following information is not officially documented for Arquivo Web Crawler:
- crawl frequency
- request pattern
- IP verification method
- JavaScript rendering details
Official Documentation
View Official Arquivo Web Crawler Documentation →
Information sourced from official documentation. Content generated with AI assistance.