- How to open the tool and start working with it.
- Tool features.
- Issues detected by the ‘XML sitemap validator’.
XML sitemap validator is the built-in tool that helps you solve the following tasks:
- To find issues in a sitemap
- To extract a list of links from an XML sitemap and move them to the main program menu for further crawling.
- To ping search engines notifying of changes in XML sitemaps.
- Sitemap check can be performed without crawling the website itself.
1. How to open the tool and start working with it
The tool can be opened in three ways:
- Via ‘Tools/Run → XML sitemap validator’ in the control panel
- By using the ‘Alt+X’ hotkey
- Via ‘List of URLs → Download from the sitemap’ in the main menu
To start searching for issues:
1. Enter the sitemap URL in the corresponding field and click on the ‘Start’ button. When crawling is complete, the main table will display a list of pages contained in the sitemap. The tool has two viewing modes:
- URL (sitemap content) → displays all pages contained in the sitemap;
- Sitemaps → displays all sitemaps contained in the index sitemap.
2. The table will help you examine the following attributes in the sitemap:
- Loc → URL of the page
- Lastmod → date of the last file change
- Changefreq → frequency of the page updating
- Priority → priority of the page towards other pages
3. You can find sitemap issue reports on the corresponding tab on the right side of the tool. The issues presented in this tool are based on the official Standard Sitemap Protocol documentation, which is supported by Google, Yandex, and Bing.
4. Click on the issue title to filter the results and see the list of pages containing this issue. Also, on the ‘Information’ panel, you can see a description of each issue and its target parameter.
5. To set custom filter settings, reset the applied filter and click on the ‘Set filter’ button. You will see a window where you can set the filtering conditions.
2. Tool features
The following features are available in the tool:
- Apply → applies the current filter and updates the data in the table.
- Extended copy → copies data in a sidebar into the clipboard, so you can paste it to the external table.
If you want to ping your sitemap to Google and Bing, you can do it using the corresponding button.
The results can be exported by several methods:
- By using the ‘Export’ button → exports the current table with all results.
- By using the ‘Save URLs to File’ button → saves the list of sitemap URLs to a text document.
By using the ‘Transfer’ and ‘Transfer URLs and Close’ buttons you can also move the results of the sitemap analysis to the main table.
When the work is finished, you can delete the results using one of two methods:
- Click on the ‘New sitemap’ button.
- Use the ‘Clear’ button in the main menu of the tool.
3. Issues detected by the ‘XML sitemap validator’
Issues | Description |
---|---|
Errors | |
Broken Sitemap | Indicates unavailable sitemaps or the ones with a 4xx or higher status code: unable to get data. Target parameter: Status code |
Invalid Sitemap Parent Tag | Indicates sitemaps with bad parent tag: according to the rules, it must be the or tag. Target parameter: URL |
XML Document Parsing Error | Indicates XML documents the program was unable to parse. Target parameter: URL |
Content-Type is Invalid | Indicates sitemaps in a sitemap index file which Content-Type field in HTTP response header does not contain 'text/xml', 'application/xml' or 'text/plain'. In case when files are compressed using gzip, the 'Content-Type' field should contain 'application/gzip'. Target parameter: Content-Type |
Compression Error | Indicates sitemaps that were corrupted during compression or compressed using not the gzip format. Target parameter: Status Code |
Charset Is Not UTF-8 | Indicates sitemaps with encoding different from UTF-8. Target parameter: Encoding |
Sitemap Blocked by Robots.txt | Indicates sitemaps disallowed in the robots.txt file. Target parameter: Disallowed |
Max Sitemap File Size | Indicates sitemaps larger than 49.9 MB. Target parameter: File Size |
Max URLs In Sitemap Index File | Indicates sitemap index files that contain more than 49,999 links to sitemaps. Target parameter: Number of URLs |
Max URLs in Sitemap | Indicates sitemaps that contain more than 49,999 URLs. Target parameter: Number of URLs |
Missing Links in Sitemap | Indicates sitemaps with no links found. It happens if a sitemap is empty or its content is excluded on the 'Rules' tab of crawling settings. Target parameter: Number of URLs |
Bad Sitemap URL Format | Indicates page addresses in a Sitemap index file that do not comply with the standard URL syntax: scheme://[login:password@]host[:port]][/]path[?parameters][#anchor] Target parameter: Loc |
Bad URL Format | Indicates page addresses that do not comply with the standard URL syntax: scheme://[user:password@]host[:port]][/]path[?query][#fragment]. Target parameter: Loc |
Max Sitemap URL Length | Indicates sitemaps with more than 2000 characters in URL (by default). Note that you can change the default value on the 'Restrictions' tab of crawling settings. Target parameter: URL |
Max URL Length | Indicates all pages with more than 2000 characters in URL (by default). Note that you can change the default value on the 'Restrictions' tab of crawling settings. Target parameter: URL |
Percent-Encoded Sitemap URLs | Indicates sitemaps that contain percent-encoded (non-ASCII) characters in URL. For instance, the URL https://example.com/例.xml is encoded as https://example.com/%E4%BE%8B.xml. Target parameter: URL |
Non-Percent-Encoded URLs in Sitemap | Indicates URLs that contain non-percent-encoded (non-ASCII) characters in URL. For instance, the URL https://example.com/例 which has to be encoded as https://example.com/%E4%BE%8B. Target parameter: Loc |
Special Characters in URL | Indicates URLs that contain '*', '{', '}' characters. Target parameter: URL |
Duplicate Sitemap | Indicates addresses of the sitemaps that were repeatedly found in a single or several sitemap index files. Target parameter: URL |
Link to Sitemap Index File | Indicates sitemaps that contain a link to a sitemap index file. Target parameter: Link Source |
Warnings | |
Redirected Sitemap | Indicates sitemaps redirected with a 3xx status code. Note that in contrast to the main table, here you can see the final URLs. Target parameter: Status Code |
Invalid Sitemap Location | Indicates sitemaps that break the location rules of the Standard Sitemap Protocol. A sitemap must be placed on the same host and protocol as its content. Target parameter: URL |
Invalid URL Location | Indicates URLs that break the location rules of the Standard Sitemap Protocol. URLs in a sitemap must be placed on the same host and protocol as the sitemap. Target parameter: URL |
URL Priority Is Invalid | Indicates URLs that have incoming links from the sitemap with bad tag format. Target parameter: Priority |
Priority out of Range | Indicates URLs that have incoming links from the sitemap with the tag that is out of range (0.0 to 1.0). Target parameter: Priority |
URL Changefreq Is Invalid | Indicates URLs that have incoming links from the sitemap with bad tag format. Target parameter: Changefreq |
URL Lastmod Is Invalid | Indicates URLs that have incoming links from the sitemap with bad tag date format. Target parameter: Lastmod |
Sitemap Lastmod Is Invalid | Indicates sitemaps that have incoming links from the sitemap index file with bad date format. Target parameter: Lastmod |
Long Server Response Time | Indicates addresses of the pages with TTFB (time to first byte) exceeding 500 ms (by default). Note that you can change default value on 'Restrictions' tab of crawling settings. Target parameter: Response Time |
Robots.txt Does Not Contain a Sitemap Index | Indicates sitemap index files not found in appropriate robots.txt files. Target parameter: Specified in robots.txt |
Duplicated URLs | Indicates URLs that were repeatedly found in all sitemaps. Data in this report are grouped by the 'URL' parameter. Target parameter: URL |
Contains Byte-Order Mark | Indicates sitemaps that contain a Byte-Order Mark (BOM) – a Unicode character used for text stream byte order indication. It causes problems with sitemap crawling, so it's highly recommended to avoid the BOM. Target parameter: Encoding |
Notices | |
Percent-Encoded URLs | Indicates URLs that contain percent-encoded (non-ASCII) characters and spaces. For instance, the URL https://example.com/例 is encoded as https://example.com/%E4%BE%8B. Target parameter: URL |
Robots.txt Does Not Contain a Sitemap | Indicates sitemap files not found in appropriate robots.txt. Target parameter: Specified in robots.txt |
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article