What are XML sitemaps?
XML sitemaps are a way to directly tell search engines about the pages on your website. They are created using “extensible markup language”, or XML for short, which is “is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.” (source)
XML Sitemap Structure
An XML sitemap’s structure begins with a base amount of information, and then depending on the type of content being submitted the listing will contain extra information.
Base XML structure
The base XML sitemap structure is this, with just one URL:
<?xml version=”1.0″ encoding=”UTF-8″?>
You will notice four things:
- The XMLNS version is declared at the top and references the most recent sitemap schema
- The only required tag is <loc> inside of <urlset><url></url></urlset>
- lastmod, changefreq, and priority are optional, but strongly recommended
- Every opening tag must be accompanied by a closing tag, same as HTML
You can find the full XML sitemap protocol here.
XML Sitemaps Best Practices
XML sitemaps are an often overlooked part of SEO that can either help or cripple a website.
To get them right, XML sitemaps should:
- Only contain URLs that return a 200 status code;
- URLs that return a 301 or 404 status code should not be included in sitemaps. These are called “dirt” when they are, and the search engines have said that XML sitemaps with more than 2% “dirt” are trusted less over time
- Only contain canonical URLs (if a page rel-canonicals to another, it should not be included in XML sitemaps)
- Be updated as often as possible (aka when something changes) to keep them recent;
- Be segmented by type of page. This is not a ranking factor, but does help with diagnosing indexation issues;
- Be less than 50,000 URLs or no larger than 50MB.
- Contain the correct needed tags for your content type (eg video)
Generating XML Sitemaps
One issue that many junior and even accomplished SEOs run into is “how do I generate XML sitemaps for my (or my client’s) website?”
The answer is that there are many ways, and the correct way depends on your specific situation.
The most common ways are:
- Using your CMS’s built in XML sitemap functionality. On WordPress this is done with Yoast SEO most easily, on SquareSpace it is built in. See our best CMSs for SEO post.
- Using custom code and CRON jobs, which are scheduled pieces of code that run at specific intervals to accomplish an automated task.
- If the site is static, then use a tool like Screaming Frog’s SEO Spider to crawl the site and create an XML sitemap from the results.
If you have more than one sitemap, you should generate all of them then add all of them to a sitemap index file that lists all of the sitemaps.
Submitting XML Sitemaps
Once your XML sitemaps have been generated, you should add them to the root of your website (eg getcredo.com/sitemap_index.xml) and include them in your robots.txt file.
You should then submit these sitemaps to:
You do this by testing the sitemap first, then submitting it to each of the tools:
Common XML Sitemap Issues
There are some common XML sitemap issues that I see that you should avoid:
- Submitting old URLs that return a 301 or 4** status code. These should not be included;
- Not removing URLs that start to return a 404 status code;
- Not segmenting by page type. This makes indexation investigations very tough;
- Empty required tags, especially with video sitemaps;
- Not including new pages as soon as they are published on your website, especially in News.