Netpeak Spider Settings Overview

Modified on Mon, 8 Jul at 7:13 PM

One of the key Netpeak Spider advantages is agile configurations of the program to make it perfectly match your requirements. To simplify the crawling settings, we have implemented the following features:

  1. Options – for quick access to the last opened tab of the program settings.
  2. Built-in settings templates.
  3. The ability to save your own settings templates.
  4. Quick reset to default settings.
  5. Changing the crawling speed without pausing the process.

1. Options

You can find the ‘Options’ in the ‘Settings’ menu. It quickly switches you to the last opened tab and can also be opened by Ctrl+G hotkeys. 

  • General settings → contain language configurations, crawling speed and other basic settings.
  • Advanced settings → here you can configure considering crawling and indexing instructions by Netpeak Spider, set up conditions when crawling should be automatically paused, and also configure crawling of certain page types.
  • Virtual robots.txt → configuration that overwrites a real robots.txt and allows testing new or updated crawling directives in robots.txt without updating the real file.
  • Scraping settings → allow you to extract any data from HTML pages, check the implementation of analytics systems, structured data, meta tags and much more.
  • User Agent settings → provide an opportunity to set any User Agent header in requests to the server. You can use it from the list or write any specific one.
  • HTTP-Headers settings → this feature allows configuring custom HTTP request headers.
  • Restrictions → allow you to set your own site crawl restrictions, for example, to automatically stop crawling when you reach a certain number of pages or depth. Also, you can set your custom issue  restrictions, for example, max characters in H1 tag and max bounce rate, if it exceeds, program will mark it as an issue. 
  • Spell check → use this feature to spot words with spelling errors on the website.
  • Google Analytics and Search Console settings → allows you to get data from Google Analytics and Search Console. To do this, add a Google account (you can add multiple accounts to work with them simultaneously).
  • Export settings → you can select an export file format, regional settings, and other report parameters here.
  • Authentication settings → allow you to perform SEO audit of a site that is secured by basic authentication. When using this type of authentication, the user name and password are included in the web request (HTTP POST or HTTP GET).
  • Proxy settings → necessary for setting up a list of proxies.
  • White Label settings → using this feature, you can remove Netpeak Software brand elements from the 'Technical SEO audit' PDF report and add your own logo, required contacts, and recommendations for your client.

You can combine all of the above options in the way you want.

2. Settings templates

Templates represent a combination of settings specifiel for certain tasks. For example, the common check of the site or crawling ‘through the eyes of search robots’.

The following default templates are available in Netpeak Spider:

  • Current → current project settings. 
  • Custom: last used → last used settings.
  • Default: bot → such crawling settings that make Netpeak Spider crawl a website similarly to search engine robots. When you use this template, the program won’t crawl external links and will follow every indexing and crawling instruction.
  • Default: only in directory → may be used for crawling within a certain subfolder of a site without leaving it. Images, PDFs and other files are not crawled.
  • Current (from opened project) → load the saved settings from the opened project.

You can also save your own settings templates for quick access in the future. 

To save your own templates:

  1. Set the necessary settings.
  2. Select ‘Save’ or ‘Save as... ’ in the drop-down menu.
  3. Enter the name of your template and click on the ‘OK’ button:

Please note that templates are applied to all project settings but they don’t affect application settings (‘Export’, ‘Proxy’, ‘Google Analytics’, etc) and also don’t save credentials for basic authentication (as they are personal data).

3. Control buttons

  • OK → saves updated settings.
  • Reset settings to default → to return settings to the default settings. Located at the bottom of the window and affects only the current tab.
  • Cancel and Close → close the window without saving updated settings.

4. Changing settings during crawling

To changing settings or parameters, you need to pause crawling (using the ‘Pause’ button) and after you updated settings, you can continue crawling using the ‘Start’ button. Updated settings will be used only for new results and won’t affect already crawled pages.

However, the number of threads can be changing during crawling.

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article