Set up Sphinx sitemaps¶
It is recommended to generate a sitemap for your documentation using the sphinx-sitemap extension.
Read the Docs generated sitemaps
RTD generates a basic sitemap pointing to the index page, and relies on crawlers to index the site. This is sufficient for some projects, but RTD does not generate sitemaps for subprojects.
This means any project under the Ubuntu documentation library project must generate its own sitemap.
Sitemap prerequisites¶
Ensure sphinx-sitemap
has been added to your requirements.txt
file.
Add sphinx_sitemap
to extensions
in your configuration file (docs/conf.py
):
extensions = ['sphinx_sitemap']
Required sitemap configuration¶
Sphinx Sitemap requires a html_baseurl
configured for the project in your
configuration file. For example, in docs/conf.py
:
html_baseurl = 'https://canonical-starter-pack.readthedocs-hosted.com/'
Note
Sitemap configuration is included in the Starter pack’s default configuration file.
Optional sitemap configuration¶
Sphinx sitemap uses a configurable URL scheme to set language and version options for your documentation. Default configuration provided by the starter pack uses:
sitemap_url_scheme = "{link}"
To add versioning, this can be done manually, or you can read the version from the RTD instance. To implement a manual version:
sitemap_url_scheme = "<version>/{link}"
Or, if the version is set with the version
key in your configuration file:
sitemap_url_scheme = "{version}{link}"
To read from the provided RTD environment variable:
if 'READTHEDOCS_VERSION' in os.environ:
version = os.environ["READTHEDOCS_VERSION"]
sitemap_url_scheme = '{version}{link}'
else:
sitemap_url_scheme = 'MANUAL/{link}'
Note
If you are implementing a sitemap on an RTD instance that is not a subproject,
and it uses {link}
for the sitemap_url_scheme
, RTD will replace your
sitemap with their own.
This is a known bug. The only current workaround is to use a different
sitemap name
and a custom robots.txt
pointing to it.
Validating your sitemap¶
A sitemap will be available at different locations, depending on how it is generated.
Read the Docs generated sitemaps are available at the base domain of a project, while sitemaps generated with this extension will be placed in the base of the URL schema used.
For example, two sitemaps are generated for the Sphinx sitemap’s documentation as it is hosted on RTD:
The first is generated by RTD and is available at the root of the domain: https://sphinx-sitemap.readthedocs.io/sitemap.xml
The second is generated by the sphinx-sitemap extension and is available at the base of the URL schema used by the RTD instance: https://sphinx-sitemap.readthedocs.io/en/latest/sitemap.xml
How to specify a sitemap
A robots.txt file dictates which sitemap is used to index a website. You can use a custom robots.txt file by creating your own and adding it to html_static_path in your configuration file. An example can be found in the Ubuntu documentation library project.
Supporting multiple versions¶
Sphinx sitemap does not support multiple versions by default. Configuring your
versioned documentation to use an appropriate version may be sufficient, as
Google and other automated tools will crawl websites for the purposes of indexing.
However, if you want comprehensive sitemaps for your documentation and all its
versions, you will need to deploy your own robots.txt
file and sitemap index.
For instance, using the starter pack as an example, with three versions
(1.0, 2.0, 3.0), using the RTD URL schema of {version}{link}
:
Ensure each version of your documentation has a sitemap generated by this extension with the appropriate version.
Create a
robots.txt
file, in the same directory as your configuration file, pointing to a customsitemapindex.xml
file:User-agent: * Disallow: # Allow everything Sitemap: https://canonical-starter-pack.readthedocs-hosted.com/latest/sitemapindex.xml
Create a
sitemapindex.xml
file, in the same directory as your configuration file, which points to the sitemap files of each of your documentation sets:<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> <url> <loc>https://canonical-starter-pack.readthedocs-hosted.com/latest/sitemap.xml</loc> <lastmod>2025-04-30</lastmod> </url> <url> <loc>https://canonical-starter-pack.readthedocs-hosted.com/3.0/sitemap.xml</loc> <lastmod>2025-04-30</lastmod> </url> <url> <loc>https://canonical-starter-pack.readthedocs-hosted.com/2.0/sitemap.xml</loc> <lastmod>2025-04-30</lastmod> </url> <url> <loc>https://canonical-starter-pack.readthedocs-hosted.com/1.0/sitemap.xml</loc> <lastmod>2025-04-30</lastmod> </url> </urlset>
Add
robots.txt
andsitemapindex.xml
to your configuration file:
html_extra_path = ["sitemapindex.xml", "robots.txt"]
Note
You may want to automate the generation of the sitemapindex.xml
file. To
see how this is done for the Ubuntu documentation library project, which
generates a sitemap containing subproject sitemaps, see
the script here.
This will provide a sitemapindex.xml file which points to the sphinx-sitemap generated sitemap for each version.