Guides
0views

How to Set Up an XML Sitemap

How to Set Up an XML Sitemap

When a site grows, publishes a new section, or adds a batch of links, the goal is the same: get search engines to find and index those pages fast. On large sites this is where problems show up — new pages can sit for weeks before they appear in search results. An XML sitemap speeds that process up. Below is how a sitemap works, the rules for building one, and three ways to create and configure it: with online generators, with CMS plugins, and by hand.

Contents

  1. Why create an XML sitemap
  2. Requirements and limits
  3. How to create an XML sitemap
    • Online generators
    • CMS plugins
    • The manual method
  4. Sitemap index files (for large sites)
  5. How to tell search engines about your sitemap
  6. Common mistakes
  7. Quick recap

Why create an XML sitemap

An XML sitemap hands a search engine the structural data about a resource: which URLs exist, when they were last changed, and how the site is organized. Crawlers read the file, use it to understand the sections, and index the site faster. A typical sitemap.xml can carry:

  • The URLs you want crawled.
  • The date each page was last modified.
  • Information about images and video.
  • Alternate language versions of a page (hreflang).

A sitemap is not mandatory. Without one, crawlers will still discover and analyze a site through its internal links. But if a site is new, has a lot of sections that need to be reached, or changes its content often, indexing slows down without that help. Crawlers operate on a crawl budget — they don't index every page in one pass, and important pages can be missed. A sitemap acts as a map of the structure, pointing crawlers toward the pages that should be indexed first.

There is also the HTML sitemap, which is built for human visitors rather than bots. It is a single page that gathers the priority URLs of a site so people can navigate it more easily, and it can indirectly help indexing too. Having both an XML and an HTML sitemap is ideal: one serves the crawlers, the other serves the audience.

Requirements and limits

Sitemaps follow a published standard. The full specification lives at sitemaps.org. The essentials:

  1. A single sitemap file holds up to 50,000 URLs. If you need more, split the URLs across several files and tie them together with a sitemap index (covered below).
  2. A single file must not exceed 50 MB uncompressed. Larger than that and it has to be split.
  3. Do not list junk, broken, or duplicate URLs, and do not include pages with private data, temporary content, or test pages.
  4. The sitemap and the site must sit on the same domain.
  5. The file is usually placed in the root folder (for example https://example.com/sitemap.xml), though it can live elsewhere as long as it is referenced correctly.
  6. The sitemap URL must return a 200 OK status. You can confirm this in Google Search Console, Bing Webmaster Tools, or a sitemap checker.
  7. The file must be UTF-8 encoded, declared at the top of the document.

There are three ways to build the file: generate it in an online service, use a CMS plugin, or write it by hand. Here is what each is good for.

How to create an XML sitemap

Online generators

Generator services build a sitemap.xml in a minute or two. The appeal is speed — no special knowledge required. You paste in the site URL, set the number of links, and run it.

This route suits small sites, landing pages, and one-pagers. It is a poor fit for sites where pages are added or updated constantly, because the file becomes stale the moment the generator finishes.

Commonly used generators include:

  • XML-Sitemaps (xml-sitemaps.com). The free tier crawls up to 500 URLs with no registration. Paid plans extend the limit substantially.
  • Free Sitemap Generator (freesitemapgenerator.com). Free for a few hundred URLs after email verification; paid tiers raise the limit.
  • Screaming Frog SEO Spider. A desktop crawler that exports an XML sitemap; free up to 500 URLs, paid above that. Popular with webmasters because it doubles as an audit tool.

A typical run looks like this:

  1. Open the generator and paste the site URL into the field.
  2. Start the crawl.
  3. After a minute or two, download the generated file.

Before you deploy it, open the file and check it: are the URLs correct, are the required tags present, are there any pages that shouldn't be there? If it looks right, move it onto the server.

Placing the file on the server. Use an FTP/SFTP client — free options include FileZilla, WinSCP, and Cyberduck. Connect to the server, then:

  1. Log in with your hosting credentials.
  2. Upload sitemap.xml into the site's root folder.
  3. In the same folder, open robots.txt and add a line pointing to it: Sitemap: https://example.com/sitemap.xml

If there is no robots.txt in the root, create one and add that same line.

CMS plugins

Most popular content systems have sitemap plugins or built-in support, and these are usually better than one-off generators because they regenerate the file automatically whenever pages are added or removed — no manual re-upload.

WordPress. WordPress 5.5+ ships a basic sitemap out of the box. For more control, dedicated SEO plugins handle it: Yoast SEO and Rank Math both generate and auto-update a sitemap, let you exclude post types or specific URLs, and add image and video entries. The older Google XML Sitemaps plugin still works for a sitemap-only setup.

Shopify. Every Shopify store auto-generates a sitemap at /sitemap.xml with no setup. Apps exist if you need finer control over what is included.

Wix. Generates a sitemap automatically; it is available at /sitemap.xml.

Joomla. Extensions such as OSMap and JL Sitemap generate and maintain the file.

With a plugin you generally don't touch FTP at all — the file is served dynamically at a stable URL, and you submit that URL to the search engines.

The manual method

Writing the file by hand suits very small sites (roughly under 30 pages). It works for larger sites too, but it is slow and error-prone, so it is rarely worth it past a few dozen URLs.

Open any plain-text editor (Notepad++, Sublime Text, VS Code) and follow this template:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<url>

<loc>https://example.com/</loc>

<lastmod>2026-06-22</lastmod>

</url>

<url>

<loc>https://example.com/blog/</loc>

<lastmod>2026-06-18</lastmod>

</url>

</urlset>

The tags:

  • <urlset></urlset> — required. Wraps the whole file and declares the namespace.
  • <url> </url> — required. One block per page.
  • <loc></loc> — required. The page URL. Use full, absolute URLs (https://example.com/page.html), never relative ones like /page.html. Limit: 2,048 characters.
  • <lastmod> — optional but recommended. The date the page was last meaningfully changed, in W3C / ISO 8601 format (YYYY-MM-DD, or with time YYYY-MM-DDTHH:MM:SS+00:00).
  • <changefreq> — optional. A hint about how often the page changes (always, hourly, daily, weekly, monthly, yearly, never).
  • <priority> — optional. A value from 0.0 to 1.0 indicating relative importance; the default is 0.5.

What actually matters in 2026. Google has stated publicly that it ignores <priority> and <changefreq> entirely — they are self-reported and easy to game, so the crawler doesn't trust them. What Google does use is <loc> (required) and <lastmod>, provided the lastmod date is honest and verifiable. So the practical advice is: get every URL right, keep lastmod accurate, and don't waste effort fine-tuning priority and changefreq. Faking lastmod (for example, stamping every page with today's date on every rebuild) backfires — search engines learn to distrust the dates and may ignore them site-wide. Other engines may still read priority and changefreq, so there's no harm in leaving the values your CMS generates, but they are not worth manual tuning.

Once the file is written, validate it (Google Search Console, Bing Webmaster Tools, or a third-party sitemap validator will flag missing tags and bad URLs), upload it, and reference it in robots.txt: Sitemap: https://example.com/sitemap.xml

Sitemap index files (for large sites)

When a site exceeds 50,000 URLs (or the file would exceed 50 MB), the URLs are split across several sitemap files, and those files are listed in a single sitemap index. The index is its own document that points to each child sitemap. The same ceiling applies at the index level: up to 50,000 sitemaps per index, 50 MB each.

<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<sitemap>

<loc>https://example.com/sitemap-posts.xml</loc>

<lastmod>2026-06-22</lastmod>

</sitemap>

<sitemap>

<loc>https://example.com/sitemap-products.xml</loc>

<lastmod>2026-06-21</lastmod>

</sitemap>

</sitemapindex>

The file opens with <sitemapindex> and closes with </sitemapindex>; each child sitemap goes inside its own <sitemap> block. You then submit the single index URL to the search engines — they follow it to every child file. Reference the index in robots.txt the same way: Sitemap: https://example.com/sitemap_index.xml

How to tell search engines about your sitemap

Referencing the sitemap in robots.txt lets any crawler find it, but submitting it directly in each engine's webmaster console gives you status reporting and error diagnostics. Submit to every engine you care about.

Google Search Console:

  1. Open the Sitemaps report in the property.
  2. Enter the sitemap URL.
  3. Click Submit.

Indexing can take anywhere from a few days to about two weeks; for sites the crawler already knows, it can be faster. Once processed, the report shows a success status and lists any errors to fix.

Bing Webmaster Tools:

  1. Open Sitemaps.
  2. Submit the sitemap URL.

Bing reads <lastmod> to decide what to re-crawl, so an accurate sitemap pays off here too. Bing Webmaster Tools also surfaces parsing errors.

Yandex Webmaster (relevant for Russian-language and CIS traffic):

  1. Go to the Indexing section.
  2. Choose Sitemap files.
  3. Paste the sitemap URL and add it.

Yandex processing can also take up to two weeks, and longer when an index references many child files. Problems show up in the diagnostics section.

Common mistakes

  • Not referencing the sitemap in robots.txt. Crawlers can still find a submitted sitemap, but the robots.txt line is the cheapest way to make it discoverable to every bot.
  • Exceeding the 50,000-URL or 50 MB limit in one file. Split into multiple files and join them with a sitemap index.
  • Listing junk, redirected, blocked, or deleted URLs. A sitemap should contain only canonical, indexable, 200-status URLs. Listing a URL you also block in robots.txt or mark noindex sends a contradictory signal.
  • Missing or malformed required tags (<urlset>, <url>, <loc>). A crawler that can't parse the file won't index from it.
  • Dishonest <lastmod> dates. Stamping today's date on everything erodes trust in the signal.
  • Forgetting to keep the file fresh. A static file generated once goes stale; for active sites, automate generation through the CMS.

Validate the file before and after deploying. Google Search Console, Bing Webmaster Tools, Yandex Webmaster, and standalone sitemap validators all check for the issues above for free.

Quick recap

A sitemap.xml can be built three ways:

  • Online generators — fastest, good for small or static sites, no plugin dependency, but the file goes stale and must be regenerated by hand.
  • CMS plugins — best for sites that change often; the file updates automatically and you control inclusions and exclusions.
  • By hand — only practical for sites under ~30 pages; double-check the required tags <urlset>, <url>, and <loc>.

A correctly built sitemap:

  • Holds no more than 50,000 URLs per file (use a sitemap index beyond that).
  • Stays under 50 MB uncompressed per file.
  • Lives on the same domain as the site.
  • Is UTF-8 encoded.
  • Is referenced in robots.txt and reachable with a 200 status.
  • Contains only canonical, indexable URLs, with accurate <lastmod> values.

Submit the finished file (or index) to Google Search Console, Bing Webmaster Tools, and — for Russian and CIS audiences — Yandex Webmaster, so the engines discover and index your pages faster.

Share this article

Send it to your audience or copy an AI-ready prompt.

Related Articles