Source code and HTTP headers analysis

Modified on Sun, 28 Jul, 2024 at 12:43 AM

Starting the tool.
Working with results.
Export of results.

The ‘Source code and HTTP headers analysis’ tool shows how exactly Netpeak Spider analyzes the text when calculating the number of words or characters on a page and helps to understand why sometimes the data in Netpeak Spider differs from what you see when visiting the site using your browser. Learn more here → ‘Why do data in Netpeak Spider and browser differ?’.

1. Starting the tool

You can open the tool in several ways:

1.1. Using the context menu (or a hot-key). Choose a necessary URL in the results table and use the ‘Ctrl+U’ hotkey or select the ‘Source code and HTTP headers analysis’ option in the context menu.

You will see the tool window with detailed data in HTTP headers of a request and server response, information about the page, its source code and raw text on the page with no HTML elements.

1.2. Starting via the control panel. Go to ‘Run(Tools) → Source code and HTTP headers analysis’ on the control panel to open the tool.

After starting the tool, you need to enter a necessary URL into the corresponding field and click on the ‘Start’ button.

Netpeak Spider also collects all URLs that you entered previously and shows them as hints.

2. Working with results

Examples of field names that you can see on the left side and their description listed in the table below. The type and number of these fields may vary depending on the checked page, so we explain you the most common ones.

Field name	Description
General
Page Type	A type of requested page (HTML, JSON, Image, etc.)
Request URL	URL of requested page.
Request Method	Used request method when accessing the selected page (e.g. GET).
Status Code	Status code returned by requested page.
Response Time	Time (in milliseconds) before receiving the first byte from the server.
Content Download Time	The time (in milliseconds) for which the server returns the HTML code of the page.
Proxy Server	IP address and port of a proxy, from which the request was sent if a proxy is set in the program settings. Otherwise, this filed will contain the ‘(Not Set)’ value.
Remote Address	Domain IP address and port, on which requested page is located.
HTTP response headers
Date	Response generation date.
Content-Type	Type of page content.
Content-Encoding	Content encoding method used on requested page.
Connection	Management options for the current connection.
Vary	Notifies the requesting server how to match future request headers to decide if a cached response can be used instead of requesting a new response from the original server.
Set-Cookie	Cookie data. Used to send cookies from the server to the User Agent. Value format: = .
HTTP request headers
User-Agent	The current User Agent that was used when requesting the specified page. You can change the User Agent in the program settings.
Accept	List of valid resource formats.
Accept-Encoding	List of valid encodings.
Accept-Charset	A list of supported encodings to provide to the user.
Host	The URL of the domain on which the requested page is located.
Cache-Control	Directives for managing caching.
Pragma	A field that is implementation dependent and may have different values throughout the request-response chain. Used for backward compatibility with HTTP / 1.0 caches, where the Cache-Control HTTP / 1.1 header is not yet present.

On the left part of the window you can also see the list of GET-parameters if they are present in the URL of the page. For example, if the URL of the page is https://www.example.com/products?sort=popularity&os=windows, you will see the following information:

Query string parameters
sort	popularity
os	windows

Please note that this information is displayed only for pages returning 2xx status code. The source code can be displayed only for the following types of pages:

HTML;
PlainText (e.g. TXT files);
JavaScript;
CSS;
XML;
GZIP → Netpeak Spider can unpack an archive and show its content

3. Results export

Use the ‘Export’ button to export data in HTTP headers (left panel) and the ‘Save source code’ button to save the source code of the page (right panel).