NEWS.md
incomparable_get_episodes() now fetches each per-episode page for episodes that appear in stats.txt but aren’t listed on the show’s archive page yet (the archive renders on a slower cadence than stats.txt updates, so the newest episode is typically missing from the archive for hours to weeks). This recovers summary for those episodes from the per-episode og:description meta tag. topic remains NA for newest-episode gaps unless the individual page happens to populate .episode-subtitle.
New exported helper incomparable_parse_episode(episode_url, cache) returns a one-row tibble (summary, topic) for a given episode URL — exposed for direct use; called automatically by the orchestrator’s gap-fill.
The gap-fill is lazy: zero extra HTTP requests when the archive is current. Worst case scales with the gap size (typically 0–1 episodes per show per scheduled run).
incomparable_get_episodes() no longer returns NA for year, month, weekday, and network on episodes that appear in stats.txt but haven’t been added to the archive page yet (the Incomparable site renders the two surfaces independently and stats.txt typically leads by hours-to-weeks for new episodes). These four columns are now derived from the canonical stats.txt date / a constant after the join, so any row that has a date also has year, month, weekday, and network. category, topic, and summary remain NA for episodes the archive hasn’t listed yet — those fields genuinely have no source to recover them from. Reported by the podcasts.jemu.name consumer (2026-05-25).
Same fix also protects against the historical join-key mismatch case where stats.txt and the archive page disagree on an episode number (e.g. legacy sub-indexed entries like 123a / 123b): derived columns are populated from the surviving date regardless of whether the archive row matched.
HTTP layer migrated from polite to httr2. The cache argument on atp_get_episodes(), relay_get_episodes(), relay_get_shows(), incomparable_get_episodes(), incomparable_get_shows(), incomparable_parse_archive(), incomparable_parse_stats(), incomparable_get_subcategories(), and relay_parse_feed() now controls the httr2 HTTP cache only. None of these functions write RDS/CSV files as a side effect any more.
Callers that need disk artefacts must invoke cache_podcast_data() explicitly, or use update_cached_data() which bundles fetch + write.
cache_podcast_data(dir = …) defaults to here::here("data_cache") instead of the literal relative path "data_cache".
New internal request helper centralises user-agent, per-host throttling (default 1 req / 2 s), transient retries (429/5xx), and cross-session HTTP caching via tools::R_user_dir("poddr", "cache").
New package options for tuning the request layer: poddr_user_agent, poddr_throttle_rate, poddr_cache_dir, poddr_cache_max_age, poddr_cache_max_size.
robotstxt::paths_allowed() is now checked once per host at orchestrator entry (i.e. inside *_get_shows() and effectively by the per-show parsers fired from *_get_episodes()).
Added vcr cassettes for the network-touching functions, covering atp_get_episodes(), relay_get_shows(), relay_parse_feed(), relay_get_episodes(), incomparable_get_shows(), incomparable_parse_archive(), incomparable_parse_stats(), and incomparable_get_episodes(). Cassettes re-record after 30 days.
Parser logic for relay and Incomparable was factored into inner private functions taking already-parsed XML / HTML / text, so the parsing layer is tested offline against synthetic fixtures.
incomparable_parse_stats() now fetches stats.txt through polite::politely() so that requests respect robots.txt and a per-host delay, matching the rest of the scrapers.progress package to cli, dropping one direct dependency.purrr::map_dfr()/pmap_dfr() with purrr::map()/pmap() + purrr::list_rbind().relay_get_shows() now caches to relay_shows.rds instead of overwriting the episode cache.tidyselect in gather_people().testthat tests for parse_duration(), label_n(), gather_people(), and atp_parse_page().R (>= 4.1.0) in DESCRIPTION to match use of |> and \(x).Show and Network column equal "ATP" for consistency with other podcasts.500 and is not parseable.