General Settings in Netpeak Spider

Modified on Mon, 8 Jul, 2024 at 3:21 PM

Language.
Crawling speed.
Basic сrawling settings.
Multi-domain crawling.
Data backup.

On the ‘General’ settings tab, you can change the interface language, crawling speed, and basic crawling settings.

1. Language

It is possible to choose either English or Russian interface language in Netpeak Spider. Click on the button with a corresponding name and choose a necessary option from a dropdown list.

Please note that the program has to be restarted to make the settings fully come into force.

2. Crawling speed

2.1. A number of threads

Each thread creates a separate connection with a website, so please be careful as sensitive to load websites may struggle with displaying information. It is possible to adjust the number of threads during crawling to find an optimal value for the analyzed website. By default, a number of threads is equal to 10.

2.2. Delay between requests

It is the amount of time between each query to a web server. For sensitive to high load and protected websites, it is recommended to set this parameter up to prevent the overloading or to overcome website protection.

The delay is separately applied for each thread, that is why it is recommended to use one thread and a 1500-3000 ms delay between requests to imitate user behavior.

2.3. Response timeout

This is the maximum waiting time for server response measured in milliseconds before the crawler considers a page as broken with the ‘Timeout‘ response code and switches to the next URL. This setting also impacts detecting ‘Connection Error‘.

Minimum possible value – 50 ms.
Maximum possible value – 90 000 ms.

2.4. JavaScript rendering

Tick to enable JavaScript rendering. This might be useful when a part of the content is generated or the whole site is developed using JS frameworks.

JavaScript rendering in Netpeak Spider is implemented through the use of built-in Chromium web browser. JS is executed only on compliant HTML pages (returning the 200 OK status code), scripts of analytics systems are blocked, images and iframes are not loaded.

‘AJAX timeout‘ is the main setting which sets the latency of JavaScript execution after loading of the entire page and all resource files (JS/CSS). Note that the more the AJAX timeout is, the longer the crawling will take. In most cases, 2 seconds set by default are enough for JavaScript execution. However, if the crawled site has AJAX requests that need more time to execute, it is possible to set a custom value. We do not recommend setting the value too low as it might be not enough for the code to fully execute.

3. Basic crawling settings

3.1. Crawl only in directory

The program will crawl the site inside a particular category without leaving it.

Please take into account that Netpeak Spider orients itself according to a segment in a URL of a page. Consequently, the website must have an appropriate structure to use this mode. Thus, during crawling inside a category of product pages by the address example.com/category-1, goods from example.com/category-1/product will be included to reports but product pages with the address example.com/product will not because their URLs starts from a different URL section, even if the crawled category has links to these pages.

3.2. Crawl all subdomains

If it is checked, subdomains will be considered as a part of the analyzed website and links to these subdomains will be considered internal. Otherwise, all the results received from the subdomains will be not be considered a part of the crawled website and links to them will be considered external.

3.3. Crawl external links

Choose this parameter to add all external links to the main table. Note that the same parameters and issues are checked for external links as well as for internal ones. Thus, the ‘Issues‘ panel will show the total number of issues for internal and external links. However, it is possible to create a report only for external links using the segmentation feature.

3.4. Check JavaScript, CSS, and PDF

The program gathers information (response code, size, etc.) about JavaScript, CSS and PDF formats found on the website. Take into account that Netpeak Spider doesn’t analyze their content.

3.5. Check images

We recommend enabling this configuration because:

It allows the program to collect common SEO-parameters for images.
It affects detection of the ‘Broken images‘ and ‘Max image size‘ issues.

3.6. Check other MIME types

This setting stands for collection of information about documents, video and audio files, etc. As well as with previous files, Netpeak Spider doesn’t scan their content but collects their common SEO parameters.

You have the ability to use the built-in templates for a specific crawling method: from the default template, suitable for most standard SEO tasks, to the method that allows crawling websites the similar way as search engine robots do.

4. Multi-domain crawling

Tick to enable feature that allows to crawl multiple domains simultaneously.

Program starts crawling domains from URLs with 0 click depth. To do so, add the list of the needed URLs to the main table.

5. Data backup

Tick to let the program back up the collected data automatically. This is useful when there is a risk of a sudden computer shutdown and data loss.

The data will be saved at the intervals you specify, as well as when you stop (or pause) the crawling and when it is complete. Please note: the shorter the interval is the more often the copy will be made and the longer the analysis will take.

If the program is closed suddenly, the next time you run the Netpeak Spider, it will open a temporary project, which was saved during the last backup. To save the temporary project, go to the menu ‘Project’ → ‘Save’ and specify the file path where it will be located.