Skip to main content
Can AI see it

Know what AI sees. Measure what it's worth.

What is New York Times Newsgathering?

Direct Answer: New York Times Newsgathering bot collects public, non-copyright data for newsroom use.

Operator: The New York Times Type: Other Bot Purpose: Collecting public, non-copyright data for newsroom use

The New York Times Newsgathering bot is used by coders within the NYT newsroom to collect public, non-copyright data from government and commercial websites. It is used for tasks such as archival projects and public-service data collection, including U.S. Elections pages and Covid-19 trackers. The bot follows industry best practices, including controlling request volume, throttling, and identifying itself with custom UserAgents and X-headers.

User-Agent Identification

The following user-agent strings identify New York Times Newsgathering in your live traffic data:

  • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36 nyt_scraping/scraping@nytimes.com

robots.txt Rules for New York Times Newsgathering

Respects robots.txt: No

Robots.txt has limited effect on user-initiated bots

New York Times Newsgathering is triggered by user actions within The New York Times's products. While The New York Times states it respects robots.txt, the bot operates differently from autonomous crawlers — it fetches specific URLs on demand rather than systematically spidering your site. Server-log monitoring is the only reliable way to verify what actually happens.

Need continuous verification across 500+ bots? Can AI See It automates this.

Crawl Behavior

Frequency:Not Documented

Request Pattern:Not Documented

Official Documentation Quotes

"Coders within The New York Times newsroom write scripts and scrapers that collect public, non-copyright data from government and commercial websites, ranging from archival tasks to public-service data like our U.S. Elections pages and Covid-19 trackers."

"We bake-in industry best practices like controlling the volume of requests, throttling/concurrency and identifying our work with custom UserAgents and X-headers."

Crawl Activity Index

Relative crawl activity for New York Times Newsgathering over the past 28 days. Higher values indicate increased crawling intensity compared to the period baseline.

View recent activity data (last 7 days)
Date Activity Index
Mar 26, 2026 88.0
Mar 27, 2026 82.7
Mar 28, 2026 83.1
Mar 29, 2026 81.8
Mar 30, 2026 87.3
Mar 31, 2026 90.2
Apr 1, 2026 88.9

Source: Cloudflare Radar

Why track New York Times Newsgathering traffic?

Identify and classify unknown crawler activity. New York Times Newsgathering may appear in your live traffic data with varying frequency. Tracking its behavior helps you decide whether to allow, throttle, or block it based on actual data.

Protect your crawl budget. Every bot request consumes server resources. Understanding what New York Times Newsgathering crawls helps you prioritize the crawlers that matter.

Log Verification

To verify New York Times Newsgathering traffic in your live traffic data:

  1. Search access logs for the user-agent strings listed above
  2. Check if the IP addresses match documented ranges (if provided by The New York Times)
  3. Verify the crawl pattern matches documented behavior
  4. Use reverse DNS lookup for additional verification if available

Note: Observed behavior in production environments may differ from official documentation. Live traffic monitoring provides the only reliable verification of actual bot behavior.

Undocumented Information

The following information is not officially documented for New York Times Newsgathering:

  • crawl frequency
  • request pattern
  • JavaScript rendering details

Official Documentation

View Official New York Times Newsgathering Documentation →

Information sourced from official documentation. Content generated with AI assistance.