Source code and HTTP headers analysis

Modified on Sun, 28 Jul, 2024 at 12:43 AM

  1. Starting the tool.
  2. Working with results.
  3. Export of results.


The ‘Source code and HTTP headers analysis’ tool shows how exactly Netpeak Spider analyzes the text when calculating the number of words or characters on a page and helps to understand why sometimes the data in Netpeak Spider differs from what you see when visiting the site using your browser.  Learn more here → ‘Why do data in Netpeak Spider and browser differ?’.


1. Starting the tool

You can open the tool in several ways:


1.1. Using the context menu (or a hot-key). Choose a necessary URL in the results table and use the ‘Ctrl+U’ hotkey or select the ‘Source code and HTTP headers analysis’ option in the context menu.


Starting the tool


You will see the tool window with detailed data in HTTP headers of a request and server response, information about the page, its source code and raw text on the page with no HTML elements.


1.2. Starting via the control panel. Go to ‘Run(Tools) → Source code and HTTP headers analysis’ on the control panel to open the tool.


Source code and HTTP headers analysis


After starting the tool, you need to enter a necessary URL into the corresponding field and click on the ‘Start’ button.


click on the Start button


Netpeak Spider also collects all URLs that you entered previously and shows them as hints.


2. Working with results

Examples of field names that you can see on the left side and their description listed in the table below. The type and number of these fields may vary depending on the checked page, so we explain you the most common ones.


Field name

Description

General

Page Type

A type of requested page (HTML, JSON, Image, etc.)

Request URL

URL of requested page.

Request Method

Used request method when accessing the selected page (e.g. GET).

Status Code

Status code returned by requested page.

Response Time

Time (in milliseconds) before receiving the first byte from the server.

Content Download Time

The time (in milliseconds) for which the server returns the HTML code of the page.

Proxy Server

IP address and port of a proxy, from which the request was sent if a proxy is set in the program settings. Otherwise, this filed will contain the ‘(Not Set)’ value.

Remote Address

Domain IP address and port, on which requested page is located.

HTTP response headers

Date

Response generation date.

Content-Type

Type of page content.

Content-Encoding

Content encoding method used on requested page.

Connection

Management options for the current connection.

Vary

Notifies the requesting server how to match future request headers to decide if a cached response can be used instead of requesting a new response from the original server.

Set-Cookie

Cookie data. Used to send cookies from the server to the User Agent. Value format: = .

HTTP request headers

User-Agent

The current User Agent that was used when requesting the specified page. You can change the User Agent in the program settings.

Accept

List of valid resource formats.

Accept-Encoding

List of valid encodings.

Accept-Charset

A list of supported encodings to provide to the user.

Host

The URL of the domain on which the requested page is located.

Cache-Control

Directives for managing caching.

Pragma

A field that is implementation dependent and may have different values throughout the request-response chain. Used for backward compatibility with HTTP / 1.0 caches, where the Cache-Control HTTP / 1.1 header is not yet present.


On the left part of the window you can also see the list of GET-parameters if they are present in the URL of the page. For example, if the URL of the page is https://www.example.com/products?sort=popularity&os=windows, you will see the following information:


Query string parameters

sort

popularity

os

windows


Please note that this information is displayed only for pages returning 2xx status code. The source code can be displayed only for the following types of pages:

  • HTML;
  • PlainText (e.g. TXT files);
  • JavaScript;
  • CSS;
  • XML;
  • GZIP → Netpeak Spider can unpack an archive and show its content


3. Results export

Use the ‘Export’ button to export data in HTTP headers (left panel) and the ‘Save source code’ button to save the source code of the page (right panel).


Results export


Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article