Netpeak Spider Crawling the List of URLs

Modified on Tue, 5 Dec, 2023 at 11:28 AM

Netpeak Spider Crawling the List of URLs

As promised, we are presenting one of the most long-awaited features of Netpeak Spider – a new mode to crawl the list of URLs. We’ve been thoroughly thinking about the different ways of its implementation and we’d love you to have a try and share your opinion whether we managed to reach the maximum convenience of use. But to answer this question deliberately, let’s first take a closer look at the review.

1. Tips on usage

As it is a new crawling mode, you should choose it among the other modes, you can do it just in a few clicks in the ‘Quick Settings’ panel (have you already tried this feature?).

You can add your custom list of URLs in three different ways:

  • manually → just add the necessary number of lines and write down the URLs
  • from clipboard  → simply press Ctrl+V in the editing window for the list of URLs and feel the power of the hotkeys
  • from a TXT file → both in a file and clipboard, the URLs must be on a separate line starting with http / https protocol. By the way, for your convenience we’ve realized the option to choose several files at once – we’ll talk about it next

We’re glad to inform that if you operate one and the same list of URLs, you’re twice lucky – the last list you’ve made will be saved and, don’t worry, it won’t disappear after closing the tool.

To understand how the list works even more clearly, you should know that the new mode with the help of a separate interface allows to make a list of URLs for crawling – kind of line crawling of which you can start only by pressing ‘Start’ button. This is a major difference from the ‘Entire website’ mode when Netpeak Spider goes deep into the website structure and can’t exactly say how many URLs are left to be crawled. And when crawling the list of URLs, the tool scans only the specified URLs without further delving.

Within new crawling mode, we also prepared some surprises that we are going to talk about next.

2. Special Features of New Crawling Mode

2.1. Validation

As you’ve probably noticed, we do love to check everything for the mistakes. That is why we implement different validation when possible.

In this case, when you work with a custom list of URLs, Netpeak Spider automatically validates the URLs which includes:

  • checking the list for:
    • duplicates (deletes the duplicated lines keeping the main URL)
    • URL entity escape compliance → schema:[//[login:password@]host[:port]][/]path[?parameters][#anchor]
  • making the protocol and host symbols lowercase
  • deleting empty symbols before and after the URL

This allows you to keep your list in order.

2.2. Status Bar and Four Types of Notifications

At some point we’ve realized that even when editing the URLs we can’t go without a progress bar, which will notify you about the main actions, such as:

  • added URLs → it may surprise you, but the list you add won’t always be perfect, so we just had to add this point
  • duplicates found → in case we find the duplicated URLs in a list, we’ll keep the main URL and the rest will be shown in a special table, which you can open by clicking on the number of duplicates in a progress bar
  • mismatching the URL format data → if there are some URLs in an invalid format, then the report for such lines you can find exactly here
  • deleted URLs → this is necessary to inform you that deleting of URLs was successfully made: by the way, you can delete URLs either with the help of context menu (right-click to open it) or with the help of Delete hotkey (at first having chosen one or several URLs)

We strongly recommend looking after this progress bar to always be aware whether everything is going fine.

2.3. Saving the List to a TXT File

We are confident that only one list of URLs won’t be enough for you to operate the tool fully and quickly. That’s why we’ve implemented the option to save your current list of URLs to a file.

Try to save the main lists of URLs you’re working with to different files in one folder – it’ll allow you to switch between them quickly using ‘Load from a TXT File’ feature. When we say quickly we mean:

  • you can do it even using Ctrl+O combination – in short, there are more than enough hotkeys in a new mode :)
  • you can select the necessary files and just move them to editing list window – so called Drag’n’Drop

3. Examples of Tasks to Perform

We’ve prepared for you some examples that will make more clear the cases in which new crawling mode can be extremely useful. So, let’s go!

✔ Moving a Website

You can move a website:

  • to another domain
  • from www to non-www or vice versa
  • from http to https (rarely the opposite way)

Also, there can be some movings inside a website (e.g. changing categories structure).

Regardless of the reason you’re moving a website, you for sure know current and required structure of the project. So the task is just to check whether everything was implemented in accordance with requirement specifications.

Netpeak Spider will be your personal assistant in this kind of tasks :)

✔ Recrawling URLs

Often enough we get the requests to add the possibility to recrawl URLs in order to recheck the changes carried out. Frankly saying, we already have plans to implement this feature in the near future, and as for now you can simply group the necessary URLs to a new list and check them → this way the task will be completely covered.

✔ Competitors Comparison

The current version of Netpeak Spider allows you to check 40+ parameters and detect 50+ issues. Try to make a custom list of your website top pages and appropriate pages of your competitors in order to check the websites optimization level → it’s an additional boost for you to correct key SEO issues.

You can learn about the most popular pages of your and your competitors’ websites using Serpstat:

  • Step 1: enter your website address and press ‘Search’
    Address bar in Serpstat
  • Step 2: choose ‘SEO Research' → ‘Top Pages’: here you can find all website pages; those that drive the largest amount of traffic are at the top of the list
    Top pages in Serpstat
  • Step 3: by clicking on the necessary page, you’ll find yourself in the ‘URL analysis’ menu
    URL analysis menu in Serpstat
  • Step 4: go to the ‘Competitors’ tab to get your main competing pages (exactly pages, not domains)
    Competitors tab in Serpstat
  • Step 5: export this table in any convenient way and load the necessary URLs to Netpeak Spider
    Exporting data in Serpstat

✔ Checking the Backlinks

Later on, we’ll surely make the process of checking backlinks more convenient (probably in Netpeak Checker) but in the meanwhile, you might do the following:

  • add the URL links to which must be on referring pages to a list
  • add all the URLs which must contain a link to your page to a list
  • crawl the list with ‘Incoming Links’ parameter checked
  • open a context menu by right-clicking on your URL and choose ‘Incoming Links’ → Netpeak Spider checks the linking between all the pages in a list, this way you’ll get all the links to your target URL from the pages in a list
  • exclude this found pages from the general list of referring pages and you’ll get the list of URLs which don’t have a link to the target page

This example isn’t the main purpose to use new crawling mode, just its pleasant addition :)

✔ Regular Checkings for the Issues

You could try to draw up a list of your most important URLs and scan it from time to time → in the new mode, there is no limit of one website so the list can contain all the important pages with extremely critical issues that need to be fixed immediately.

Redirects and 404 error pages, long server response time, new title, missing images alt attributes, disallowed pages in robots.txt, Meta Robots, X-Robots-Tag, or Canonical – these are some issues which often occur even on top websites. Make your own list of issues for regular checking, it’ll save your time and energy.

Let’s sum it up

We’ve added a new crawling mode that allows you to crawl the list of URLs and thus carry out lots of various tasks. The interface is made to always keep you informed about the process of operating the list of URLs. Lose no time to give the new mode a try and we’d much appreciate if you share your feedback with us!

We’re on our way to achieving the goal (code-named ‘Netpeak Spider 2.1.3’) and in a little while, we’re going to please you with new useful features. Stay with us to be the first to learn about the updates!

Was this article helpful?

That’s Great!

Thank you for your feedback

Sorry! We couldn't be helpful

Thank you for your feedback

Let us know how can we improve this article!

Select at least one of the reasons
CAPTCHA verification is required.

Feedback sent

We appreciate your effort and will try to fix the article