Why Pages Get Deindexed (and How to Diagnose It with a Crawl)

Last week’s traffic chart looks like a cliff. Pages that ranked yesterday are gone. Site:yourdomain.com in Google returns a fraction of what it used to. You did not change anything, or so it feels. Sound familiar? You are not alone. In any given week the technical SEO forums are full of the same post: “1800 pages deindexed overnight, what happened?”

The honest news is that “deindexed” is almost never a single cause and almost always diagnosable. The work is not panic, it is triage. This guide walks through what the word actually means, the eight causes that explain the overwhelming majority of cases, a 15 minute crawl-based check that finds yours, what to do about each cause, and how to get the pages back in the index.

”Deindexed” Actually Means Three Different Things

Before you fix anything, get the terminology right. The three situations get conflated constantly and they call for different responses.

Was indexed, now dropped. Google had your URL in its index, served it in results, then removed it. This is the cliff-edge case. It is also the case where a technical or quality change usually explains the drop, and where you can often reverse it.

Crawled, currently not indexed. Google fetched the page but chose not to index it. In Search Console this appears as “Crawled - currently not indexed” or “Discovered - currently not indexed”. The page exists, the crawler found it, but Google judged it not worth indexing. The fix here is usually content quality, internal linking, or a duplicate-content signal, not a technical block.

Never indexed. The URL was never in the index at all. Could be a new page Google has not gotten to yet, or a page blocked by robots.txt, or one with noindex from day one, or simply orphaned and undiscoverable.

The Search Console Page Indexing report is the cleanest place to tell these apart. Open it before you do anything else and look at which bucket your missing URLs sit in. The fix path from “Submitted and indexed” → “Crawled - currently not indexed” is very different from the fix path for “Excluded by ‘noindex’ tag”.

This guide focuses mostly on the first case, “was indexed, now dropped”, which is what most “1800 pages disappeared overnight” reports turn out to be.

The Eight Causes That Explain Almost Every Case

Mass deindexing almost always has one of these eight causes behind it. The list is in the rough order I check them, because some are far more common than the rest.

1. A noindex meta tag or X-Robots-Tag header was deployed accidentally. This is the single most common cause of “pages got deindexed overnight”. A staging environment shipped to production with <meta name="robots" content="noindex"> still in the template. A CMS plugin update flipped a default. A header on the CDN started serving X-Robots-Tag: noindex. The pages still resolve, the content still looks fine, but Google sees the directive and drops them on the next crawl. Check both the HTML head and the HTTP response headers, because either can carry noindex.

2. robots.txt started blocking the URLs. A new Disallow: / line, an overly broad path, or a CDN starting to serve a different robots.txt. Note an important nuance: a robots.txt block usually does not remove a page from the index, it just prevents future crawling. But combined with a recent re-crawl or an inability to re-confirm a noindex, it can lead to pages dropping. Always check the rendered, current robots.txt Google sees, not the file in your repo. See the complete guide to robots.txt and AI crawlers for the syntax mistakes that quietly break the file.

3. Canonical tags pointing somewhere else. A rel="canonical" that points to the homepage, to a single hub page, or to a wrong language version will collapse the index entries down to the canonical target. The pages remain technically reachable, they just are not considered the canonical document anymore. This is especially nasty because everything else about the page looks fine. The guide to canonical tags covers the patterns that go wrong.

4. Status codes that drove the URLs out. Pages that started returning 404 or 410 will be removed. Pages returning 5xx for long enough will be too. Soft 404s, where the URL returns 200 but the page is empty or says “not found”, get the same treatment. A surprising number of deindexing cliffs trace back to a deployment that broke a route handler and started returning 404 across a whole section.

5. An accidental staging or dev environment shipped to production. The deploy went out with the staging template, the staging robots.txt, the staging meta tags, or both. The site looks alive but its indexing signals are inverted.

6. Hreflang conflicts. A miswired hreflang setup can cause Google to consolidate language versions or drop the wrong ones. The hreflang implementation guide is the place to verify yours.

7. A manual action or core update reassessment. Less common than the technical causes but very real. A manual action shows up in Search Console under Security and manual actions. A core update is invisible but its fingerprint is recognizable: a broad, gradual ranking and indexing decline that lines up with a Google-announced update window, often hitting thin or low-trust content first. We cover the calm-down version of this in the year-in-title and core update guide once you have ruled out the technical causes.

8. Thin, duplicate, or auto-generated content that Google decided was not worth indexing. Mostly shows up as “Crawled - currently not indexed”. The fix is editorial: prune, merge, or substantially improve the affected pages.

Eighty percent of “pages got deindexed overnight” stories trace back to causes one through five. Always work that list first.

Diagnose It with a Crawl: A 15 Minute Triage

Search Console alone is slow for big sites. The fastest way to find your cause is to crawl the affected section yourself and filter aggressively. Here is the path:

Step 1: Confirm the symptom in Search Console (2 minutes).

Open Page Indexing. Note the buckets your URLs are sitting in. “Excluded by ‘noindex’”, “Blocked by robots.txt”, “Soft 404”, “Not found (404)”, “Alternate page with proper canonical tag”, “Duplicate, Google chose different canonical than user”, and “Crawled - currently not indexed” each point at a different cause from the list above.
Open URL Inspection for one specific lost URL. The verdict line and the indexing details panel are the single most useful diagnostic in Search Console.

Step 2: Crawl the site and filter (8 minutes). Run a crawl of your domain, then filter the results in this order:

Pages serving noindex in either the HTML or X-Robots-Tag header. Even one row here on a template-level URL pattern is enough to explain a mass drop.
Non-200 status codes. Surface every 3xx, 4xx, 5xx. Look for patterns: an entire section returning 404, a category page in a redirect loop, a 5xx spike.
Canonical pointing elsewhere. Filter for pages where the canonical URL does not equal the page URL itself. Then check whether the canonical is intentional. Whole sites get deindexed by a global canonical pointing to the homepage.
Blocked by robots.txt. Surface every URL the crawler skipped because of robots.txt. Cross-check against your sitemap.

Step 3: Compare sitemap vs crawl vs index (3 minutes). A clean sitemap, the set of URLs the crawler actually reached and got a 200 for, and the set of URLs Search Console reports as indexed should overlap heavily. The delta tells the story:

URLs in sitemap but not crawled → discoverability or robots.txt problem
URLs crawled but not indexed → quality, canonical, or noindex problem
URLs indexed but not in sitemap → sitemap freshness problem (less urgent)

Step 4: Look at server logs for bot traffic (2 minutes). If you have access to the access logs, grep for Googlebot in the last 72 hours and see whether it is hitting the affected URLs at all. A sudden drop in crawl frequency on a section is itself a signal. This is also the only place you can confirm whether the AI crawlers and search bots are actually reaching what you think they are. The same routine is part of any healthy crawl budget audit.

Most causes reveal themselves in step 2. The big-site cases where it takes longer almost always come down to a canonical or hreflang issue spread thin across many pages.

Fixing Each Cause

Once you have identified the cause, the fix is usually short.

Noindex left in production: remove the tag or header. Confirm with curl -I for the header and view-source: for the HTML. Request indexing for a sample URL in Search Console. Most pages return within days, large sections take longer because Google re-crawls on its own schedule.

robots.txt block: edit the file, deploy, verify with the robots.txt Tester in Search Console or by simply fetching yourdomain.com/robots.txt from a browser. Then resubmit the sitemap to nudge a re-crawl.

Wrong canonical: correct the rel="canonical" value to the page itself, or to the actual canonical target. Watch out for canonicals injected by SEO plugins or CMS settings, not just the template.

4xx / 5xx errors: fix the underlying routing or server problem. For permanent removals you intend, 410 is more decisive than 404 but both work. For soft 404s, either restore meaningful content or actually return 404.

Staging deployed to production: revert. Then audit how it happened: a noindex/staging robots.txt should never ship.

Hreflang conflicts: the hreflang guide lists the five usual breakages. Most come down to non-reciprocal annotations or wrong region codes.

Manual action: open the manual actions report, read the specific finding, fix it, and submit a reconsideration request. Be honest in the request, vague apologies do not pass.

Quality reassessment / core update: the fix is editorial and slower. Identify the affected pages, decide which deserve substantial rewrites, which should be merged or pruned, and which should be left alone. The year-in-title and core update guide covers the longer playbook.

Getting Pages Re-Indexed

Once the cause is fixed, the page does not automatically jump back. You can speed the re-index along:

Request Indexing in the URL Inspection tool for high-value pages. It is rate-limited, so use it for templates and hubs, not every URL.
Resubmit the sitemap. This nudges Google to re-crawl the URLs in it. If the sitemap itself is stale, fix it first. The XML sitemap guide covers the validation step.
Strengthen internal links to the affected URLs. Sometimes the deindexed pages were also orphaned or weakly linked. A few well-placed internal links from healthy hub pages help re-discovery.
Be patient. Small sites often re-index within a few days. Large sites with thousands of dropped URLs can take weeks for the index to catch up, even with everything fixed.

Technical Problem or Penalty? How to Tell

This question gets asked every time, and the answer is almost always “technical, not penalty”. Here is the quick check:

Manual action: there will be a notice in the manual actions report. If the report is empty, it is not a manual action.
Algorithmic / core update: the decline is gradual, broad, and aligns with a known Google update window. Other sites in your space see similar movement on the same dates.
Technical: the decline is sudden, the timing usually aligns with a deployment or platform change, and Search Console’s Page Indexing report points at a specific cause (noindex, robots block, 404, canonical).

The technical case is by far the most common. Most “Google penalized me” posts turn out to be a noindex that shipped two days before traffic fell.

Prevention: A Pre-Deploy Indexing Checklist

Most deindexing incidents are preventable. A short pre-deploy checklist catches the worst ones:

Production HTML must not contain noindex on any page that should be in the index. CI can grep for this.
Production robots.txt must not contain Disallow: /. CI can fetch and check this.
Production must respond with 200 (or intentional 301) on a list of critical URLs. CI can curl and assert.
X-Robots-Tag headers on key URLs must not include noindex.
Canonical tags on a sample of URLs must point to themselves (or to an intended canonical).
The deployed sitemap matches the live URLs.

A nightly external crawl that fails the build if any of these checks regresses is the closest thing to a real safety net. The technical SEO audit checklist covers the broader recurring audit.

Conclusion

Mass deindexing feels like a catastrophe and usually is not one. In the overwhelming majority of cases the cause is technical, the cause is identifiable in under 15 minutes with a crawl plus the Search Console Page Indexing report, and the fix is short. Panic is the enemy of triage. Start with the eight-cause list, run the four-step diagnosis, fix what you find, and request re-indexing on a sample of high-value URLs.

If you want to run that 15 minute crawl right now, download Seodisias and point it at your site. It crawls locally on your machine with no URL limit, surfaces every noindex tag, every non-200 status, every canonical pointing elsewhere, and every URL blocked by robots.txt in one pass. No upload, no sign up, all data stays with you. Compare what it finds against your Search Console Page Indexing report and the cause is almost always in the gap between the two.

Why Pages Get Deindexed (and How to Diagnose It with a Crawl)

”Deindexed” Actually Means Three Different Things

The Eight Causes That Explain Almost Every Case

Diagnose It with a Crawl: A 15 Minute Triage

Fixing Each Cause

Getting Pages Re-Indexed

Technical Problem or Penalty? How to Tell

Prevention: A Pre-Deploy Indexing Checklist

Conclusion

Related Posts

Hreflang Implementation Guide: Three Methods, x-default, and Five Common Mistakes

XML Sitemaps at Scale: Build, Split, and Validate Without Quiet Errors

The Complete Guide to robots.txt: Rules, Examples, and AI Crawlers