Overview in Netpeak Spider is a summary report of a crawled website, containing pages grouped by different criterions and shows an absolute and relative number of pages belonging to each group. You can find it under ‘Reports → Overview‘ in a sidebar.
All groups and subgroups of pages in the ‘Overview‘ tab are presented below.
Group | Description |
Page Status | |
Internal | All internal pages of the crawled website. |
External | Pages from external websites linked from the crawled website. |
Refresh Redirected | Pages that contain the Refresh tag (in HTTP server response headers or in Meta Refresh of the block), pointing to another URL. If the link in the Refresh tag points to the same page where the tag is located, then such page will not be included in this category. |
Compliant | HTML pages returning a 2xx status code and not hidden from search engine robots by indexing instructions (robots.txt, Canonical, Meta Robots, etc.). |
Non-compliant | HTML pages returning a status code different from 2xx or pages that are hidden from search engine robots by indexing instructions. |
Noindex | Pages that contain the ‘noindex’ directive in the ‘content‘ attribute. |
Nofollow | Pages that contain directives restricting following links (these might be located in HTTP headers of server response or in the block). |
Disallowed | Pages that are hidden from search engine robots by directives in robots.txt file. |
Canonicalized | Pages containing the Canonical tag that points to another URL (note that in case when a page has the Canonical pointing to the same page, it won’t appear in this category). |
2xx HTML | All HTML pages with a 2xx status code. |
Broken | Pages with 4xx or 5xx status codes. |
Page Type | |
• HTML • Javascript • Redirect • CSS • Images • PDF • XML • PlainText • GZIP | |
JSON | JSON stands for JavaScript Object Notation. It represents an open-standard text format of data-interchange based on JavaScript. |
Other | Pages which type Netpeak Spider can’t proceed (e.g. .pptx, .dmg, etc.). |
Unknown | Those pages for which it was not possible to get the type because they returned incorrect status code (e.g. Timeout). |
Protocol | |
HTTP | Pages with HTTP protocol. |
HTTPS | Pages with HTTPS protocol. |
- Host → page grouping by the domain (including subdomains) and displaying a number of them for each host. Hosts are sorted by the number of dots in the URL from the min to max, that is why domain.com will appear higher than blog.domain.com.
- Status Code → page grouping according to their status codes (2xx Success, 3xx Redirection, 4xx Client Error, 5xx Server Error, Timeout, etc.).
- Content-Type → page grouping by content type.
- Robots.txt, Meta Robots и X-Robots-Tag → page grouping by the directives in robots.txt, Meta Robots, and X-Robots-Tag.
- AMP HTML → grouping based on AMP technology.
- Click Depth → page grouping by their click depth – a number of clicks from the initial URL.
All categories presented in the summary are interactive, so if you click on any group, Netpeak Spider will show you the report containing all the pages that belong to it.
You can also find such functions on this panel as:
- Apply → updates data during crawling.
- Collapse All → collapses all data.
- Extended Copy (Ctrl+Shift+C) → copies data from the sidebar in an extended view, which allows you to paste it into any tabular editor.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article