Sitemaps

The latest version of the starter pack generates a sitemap for your documentation using the sphinx-sitemap extension.

This page goes over the nuances of configuring sitemaps, as well as how the extension must be configured in your starter pack project.

Read the Docs-generated sitemaps

RTD generates a basic sitemap pointing to the index page, and relies on crawlers to index the site. This is sufficient for some projects, but RTD does not generate sitemaps for subprojects.

This means any project under the Ubuntu documentation library project must generate its own sitemap.

sphinx-sitemap-generated sitemaps

The standard Starter Pack uses the dirhtml builder for Sphinx recipes in the project’s Makefile.

If your project uses an older version of the Starter Pack or changes the builder, the links generated by the sitemap will be malformed. Either update to the latest version of the Starter Pack or ensure your project’s recipes use the dirhtml builder, not html.

Ensure sphinx-sitemap has been added to your docs/requirements.txt file.

Add sphinx_sitemap to extensions in your configuration file (docs/conf.py):

extensions = ['sphinx_sitemap']

Sitemap configuration

The Sphinx starter pack’s configuration file (docs/conf.py) includes default sitemap configuration.

The sphinx-sitemap extension requires a html_baseurl variable to be configured.

This is set by default as follows:

html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "/")

When building on Read the Docs, this sets html_baseurl dynamically to the value of the READTHEDOCS_CANONICAL_URL environment variable, which resolves to the full URL of the documentation including the version and language (if applicable).

In local builds and builds on other hosts, html_baseurl defaults to /.

The sitemap_url_scheme variable is set to '{link}' by default. This uses the value of html_baseurl to generate the full URL for each page for the sitemap.

Note

If you are implementing a sitemap on an RTD instance that is not a subproject, and it uses {link} for the sitemap_url_scheme, RTD will replace your sitemap with their own.

This is a known bug. The only current workaround is to use a different sitemap name and a custom robots.txt pointing to it.

lastmod configuration

As of version 2.7.0, the sitemap extension supports adding a lastmod date. Make sure that your configuration file has:

sitemap_show_lastmod = True

Exclude pages

Pages can be excluded from the sitemap by adding them to sitemap_excludes in docs/conf.py:

sitemap_excludes = [
    '404/',
    'genindex/',
    'search/',
]

Wildcards are supported. For example, _modules/* excludes the path _modules/ and all paths such as _modules/foo/bar/. For details, see Excluding Pages.

Validate your sitemap

A sitemap will be available at different locations, depending on how it is generated.

Read the Docs generated sitemaps are available at the base domain of a project, while sitemaps generated with this extension will be placed in the base of the URL schema used.

For example, two sitemaps are generated for the Sphinx sitemap’s documentation as it is hosted on RTD:

How to specify a sitemap

A robots.txt file dictates which sitemap is used to index a website. You can use a custom robots.txt file by creating your own and adding it to html_static_path in your configuration file. An example can be found in the Ubuntu documentation library project.

Support multiple versions

The sphinx-sitemap extension doesn’t support multiple versions by default. Configuring your versioned documentation to use an appropriate version may be sufficient, as search engines and other web systems crawl websites for the purposes of indexing.

If you want sitemaps for all your documentation’s versions, you need to deploy your own robots.txt file and sitemap index. Supporting multiple versions is recommended for documentation with LTS releases, as it makes past versions more prominent to search engines.

For this task, we’ll use the Starter Pack as an example. Let’s assume it has three versions, 1.0, 2.0, and 3.0, and uses the URL schema of <version>/<filename>.

First, ensure each version of your documentation has a sitemap generated by this extension with the appropriate version.

Next, create a sitemapindex.xml file in the same directory as the configuration file, and point to the sitemap files of each of your documentation sets:

sitemapindex.xml
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://canonical-starter-pack.readthedocs-hosted.com/stable/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://canonical-starter-pack.readthedocs-hosted.com/3.0/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://canonical-starter-pack.readthedocs-hosted.com/2.0/sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://canonical-starter-pack.readthedocs-hosted.com/1.0/sitemap.xml</loc>
</sitemap>
</sitemapindex>

Create a robots.txt file in the same directory as the configuration file.

If necessary, block any paths you don’t want crawled. Google describes how to do this in How to write and submit a robots.txt file.

At the end of robots.txt, point to the future path of sitemapindex.xml:

robots.txt
Sitemap: https://canonical-starter-pack.readthedocs-hosted.com/stable/sitemapindex.xml

Lastly, add both new files to the configuration file:

conf.py
html_extra_path = [
    "sitemapindex.xml",
    "robots.txt",
]

This provides a sitemapindex.xml file which points to the sphinx-sitemap generated sitemap for each version.

You may want to automate the generation of the sitemapindex.xml file. To see how this is done for the Ubuntu documentation library project, which generates a sitemap containing subproject sitemaps, see the script here.