What is New York Times Newsgathering?
Direct Answer: New York Times Newsgathering bot collects public, non-copyright data for newsroom use.
The New York Times Newsgathering bot is used by coders within the NYT newsroom to collect public, non-copyright data from government and commercial websites. It is used for tasks such as archival projects and public-service data collection, including U.S. Elections pages and Covid-19 trackers. The bot follows industry best practices, including controlling request volume, throttling, and identifying itself with custom UserAgents and X-headers.
User-Agent Identification
The following user-agent strings identify New York Times Newsgathering in your live traffic data:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 nyt_scraping/scraping@nytimes.com
robots.txt Rules for New York Times Newsgathering
Respects robots.txt: No
Robots.txt has limited effect on user-initiated bots
New York Times Newsgathering is triggered by user actions within The New York Times's products. While The New York Times states it respects robots.txt, the bot operates differently from autonomous crawlers — it fetches specific URLs on demand rather than systematically spidering your site. Server-log monitoring is the only reliable way to verify what actually happens.
Need continuous verification across 500+ bots? Can AI See It automates this.
Crawl Behavior
Frequency:Not Documented
Request Pattern:Not Documented
Official Documentation Quotes
"Coders within The New York Times newsroom write scripts and scrapers that collect public, non-copyright data from government and commercial websites, ranging from archival tasks to public-service data like our U.S. Elections pages and Covid-19 trackers."
"We bake-in industry best practices like controlling the volume of requests, throttling/concurrency and identifying our work with custom UserAgents and X-headers."
Crawl Activity Index
Relative crawl activity for New York Times Newsgathering over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.
View recent activity data (last 7 days)
| Date | Activity Index |
|---|---|
| Mar 26, 2026 | 88.0 |
| Mar 27, 2026 | 82.7 |
| Mar 28, 2026 | 83.1 |
| Mar 29, 2026 | 81.8 |
| Mar 30, 2026 | 87.3 |
| Mar 31, 2026 | 90.2 |
| Apr 1, 2026 | 88.9 |
Source: Cloudflare Radar
Why track New York Times Newsgathering traffic?
Identify and classify unknown crawler activity. New York Times Newsgathering may appear in your live traffic data with varying frequency. Tracking its behavior helps you decide whether to allow, throttle, or block it based on actual data.
Protect your crawl budget. Every bot request consumes server resources. Understanding what New York Times Newsgathering crawls helps you prioritize the crawlers that matter.
Log Verification
To verify New York Times Newsgathering traffic in your live traffic data:
- Search access logs for the user-agent strings listed above
- Check if the IP addresses match documented ranges (if provided by The New York Times)
- Verify the crawl pattern matches documented behavior
- Use reverse DNS lookup for additional verification if available
Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.
Undocumented Information
The following information is not officially documented for New York Times Newsgathering:
- crawl frequency
- request pattern
- JavaScript rendering details
Official Documentation
View Official New York Times Newsgathering Documentation →
Information sourced from official documentation. Content generated with AI assistance.