Bug Fixes & Improvements
Structured robots.txt parsing, real-time pre-crawl logs, crawl phase indicators, and critical fixes for module state isolation and redirect tracking.
Improvements
- • robots.txt Parser The robots.txt module now parses the file into structured data: User-Agent groups with Allow/Disallow rules and Sitemap URLs
- • robots.txt Summary Card A new info card on the Reports page shows whether robots.txt exists, its size, rule group count, and sitemap status at a glance
- • robots.txt Rules Table A full-width table below the info cards displays all User-Agent groups with their Allow and Disallow rules in a clean, readable format
- • PDF Export The robots.txt section is now included in PDF reports with summary info, sitemap URLs, and the full rules table
- • Skipped URL Tracking URLs blocked by robots.txt or exceeding max depth are now tracked as "skipped" across the progress bar, crawl history, and status bar for accurate crawl statistics
- • Real-time Pre-crawl Logs The crawl log now shows live progress during pre-crawl checks (SSL, sitemap, robots.txt, domain, compression, favicon, etc.) so users can see what's happening before URL crawling begins
Bug Fixes
- • robots.txt Info Card Sizing Moved the rules table out of the grid card into its own full-width section to prevent neighboring cards from stretching vertically
- • Missing URL Count in Crawl Fixed an issue where URLs blocked by robots.txt or exceeding max depth were counted as discovered but never reported as crawled, errored, or skipped, causing a mismatch in crawl totals
- • Module State Leaking Between Crawls Fixed a bug where in-memory state from previous crawls leaked into new ones, causing modules like redirections, caching, sitemap, and favicon to show data from other projects
- • Redirect Distribution Chart Empty Fixed the redirect distribution chart always showing empty because the query was looking at final status codes (200) instead of the redirections table