Paste your XML sitemap to instantly count all URLs, detect sitemap indexes, find duplicate entries, visualize path depth, and validate against Google's 50,000 URL limit.
Paste the XML from your sitemap file (right-click → View Source on yoursite.com/sitemap.xml, then copy all). Handles both <urlset> and <sitemapindex> formats.
How XML sitemaps work, what formats exist, and how to optimize them for search engine crawling.
The standard XML sitemap uses <urlset> as the root element and <url><loc> for each URL. Optional tags: <lastmod> (last modified date), <changefreq> (how often the page changes), and <priority> (0.0–1.0, relative importance). Google only uses <loc> and <lastmod> reliably — the other tags are largely ignored.
When your site exceeds 50,000 URLs, you split into multiple sitemap files and reference them from a sitemap index. The index uses <sitemapindex> as root and <sitemap><loc> to point to each child sitemap. Google Search Console lets you submit the index URL and discovers all child sitemaps automatically. This counter detects and labels both formats.
Include canonical URLs only — the final destination URL without redirect chains. Include: key product pages, blog posts, category pages, and location pages. Exclude: URLs with noindex, paginated pages (except page 1), URLs with parameters if you have canonical versions, admin pages, and URLs returning 4xx or 5xx status codes. A clean, focused sitemap helps Googlebot prioritize crawling your important content.
Submit your sitemap in Google Search Console at Search Console → Sitemaps → Add sitemap. Use absolute URLs including protocol. Keep <lastmod> accurate — setting it to today's date on every page trains Google to ignore it. Split large sites into topic-based sitemaps (products-sitemap.xml, blog-sitemap.xml) for better crawl prioritization. Compress with gzip to reduce file size for large sitemaps.
Official Google sitemap requirements — what counts, what's mandatory, and what the limits are.
| Rule | Limit / Requirement | What happens if exceeded |
|---|---|---|
| URLs per sitemap file | 50,000 maximum | Google may stop reading at 50,000 — remaining URLs ignored |
| Sitemap file size | 50MB uncompressed | Google rejects files over 50MB — use gzip compression or split |
| Sitemaps per sitemap index | 50,000 maximum | Index itself counts against the 50,000 limit |
| URL format | Absolute URLs only | Relative URLs are invalid — Google may not process them |
| URL encoding | Must use entity escaping | Unescaped &, ', ", <, > cause XML parse errors |
| Character encoding | UTF-8 only | Other encodings cause parse failures |
| Sitemap location scope | Must be at or above sitemap location | Sitemap at /blog/sitemap.xml can only list /blog/* URLs |
| lastmod format | W3C datetime (YYYY-MM-DD) | Non-standard dates may be ignored by Google |
| priority / changefreq | Optional, largely ignored | No negative effect — Google mostly ignores these |
| Submission | Google Search Console or robots.txt | Unsubmitted sitemaps may still be discovered, but slower |