New features

  • incomparable_get_episodes() now fetches each per-episode page for episodes that appear in stats.txt but aren’t listed on the show’s archive page yet (the archive renders on a slower cadence than stats.txt updates, so the newest episode is typically missing from the archive for hours to weeks). This recovers summary for those episodes from the per-episode og:description meta tag. topic remains NA for newest-episode gaps unless the individual page happens to populate .episode-subtitle.

  • New exported helper incomparable_parse_episode(episode_url, cache) returns a one-row tibble (summary, topic) for a given episode URL — exposed for direct use; called automatically by the orchestrator’s gap-fill.

  • The gap-fill is lazy: zero extra HTTP requests when the archive is current. Worst case scales with the gap size (typically 0–1 episodes per show per scheduled run).

Bug fixes

  • incomparable_get_episodes() no longer returns NA for year, month, weekday, and network on episodes that appear in stats.txt but haven’t been added to the archive page yet (the Incomparable site renders the two surfaces independently and stats.txt typically leads by hours-to-weeks for new episodes). These four columns are now derived from the canonical stats.txt date / a constant after the join, so any row that has a date also has year, month, weekday, and network. category, topic, and summary remain NA for episodes the archive hasn’t listed yet — those fields genuinely have no source to recover them from. Reported by the podcasts.jemu.name consumer (2026-05-25).

  • Same fix also protects against the historical join-key mismatch case where stats.txt and the archive page disagree on an episode number (e.g. legacy sub-indexed entries like 123a / 123b): derived columns are populated from the surviving date regardless of whether the archive row matched.

Breaking changes

New features

  • New internal request helper centralises user-agent, per-host throttling (default 1 req / 2 s), transient retries (429/5xx), and cross-session HTTP caching via tools::R_user_dir("poddr", "cache").

  • New package options for tuning the request layer: poddr_user_agent, poddr_throttle_rate, poddr_cache_dir, poddr_cache_max_age, poddr_cache_max_size.

  • robotstxt::paths_allowed() is now checked once per host at orchestrator entry (i.e. inside *_get_shows() and effectively by the per-show parsers fired from *_get_episodes()).

Testing

Dependencies

  • Removed: polite, memoise.
  • Added: httr2, xml2, robotstxt, here.
  • Suggests: vcr, withr.
  • [ATP] Append (redundant) Show and Network column equal "ATP" for consistency with other podcasts.
  • [ATP] Fix error in page enumeration due to unnumbered member special episodes.
  • [Incomparable] Fix date parsing regex failing when topics included 4-digit number
  • [ATP] Ignore members-only posts rather than including them with missing data.
  • [Relay FM] Fix incorrect host parsing, leading to all hosts being displayed as “Relay FM”.
  • [Incomparable] Add safety check in case an archive page returns 500 and is not parseable.
  • [Incomparable] Slightly improve date parsing from archive pages.
  • [Incomparable] Fix missing subcategory handling for the mothership, game show and some others.
  • [Incomparable] Update for the new Incomparable website in June 2022.
    • Some information like sub-categories for the mothership (Book Club, Old Movie Club etc.) are not yet recovered though.
  • [Incomparable] Fix bug where empty show archive pages broke the whole episode gathering.
    • Yields message such as Empty archive page for Doctor Who Flashcast at https://www.theincomparable.com/dwf/archive/
  • Add pkgdown site.
  • Pass R CMD check.
  • Add functions to get episodes from The Incomparable, Relay FM and ATP.
  • Added a NEWS.md file to track changes to the package.