The spell checking has been implemented in Netpeak Spider 3.9. This feature includes:
- ‘Spelling Mistakes’ parameter (in the ‘Content’ group)
- ‘Spelling Mistakes’ warning
- 'Spelling Mistakes' table in the 'Database' menu
- special issues report 'Spelling mistakes summary' in the 'Export' menu
- XL report 'Spelling mistakes (XL)' in the 'Export' menu
1. What is the feature for and wherein is it helpful?
The feature helps to find spelling mistakes on all web pages. In the program you can flexibly customize in which text blocks to check for mistakes.
Texts are checked with the Windows Spell Checking API, that is used in many programs (Microsoft Edge, Google Chrome, Telegram etc.). Spell check is available in up to 70 languages and simultaneously can be used several dictionaries.
You can use ignore-list to exclude from the spell check the words which are not included in the Windows dictionary and which should not be counted as a mistake.
You can add words in the Windows User Dictionary straightway from the program.
Note that the spelling check feature is supported only in OS Windows 8 or higher.
2. How to run the spelling mistakes check
2.1. Activate the feature
To activate the feature open the ‘Spell Check’ tab in the crawling settings and check the box ‘Enable spelling check’.
2.2. Add languages
Next it is needed to add the languages that the program will use to check web-pages texts. To achieve that, select the necessary languages one-by-one in the dropdown selection box and click the ‘Add language’ button . After adding they will be displayed in the ‘Added languages’ text area.
The spell-check occurs simultaneously for all selected languages. It is convenient because the one page could contain texts in different languages. For example, the Russian-language page may contain product names with English words and comments in Ukrainian.
In the dropdown selection box are available languages and dialects for which are language packages installed in the Windows language settings.
If the needed language is not in the list install the corresponding language package. The language will be available in the program after installation.
The full list of languages and dialects you can find in the table. The list can be changed depending on the operating system (it is a specific feature of Windows).
2.3. Add words to the ignore list
Add words to be skipped during spell check.
Usually, you should ignore slang words and technical terms that should be skipped in the specific project, but considered a mistake in other projects. For example, the blog netpeaksoftware.com/blog/ allows the use of the slang word ‘PROfessionals’, but on the other websites it may be undesirable to use slang, so it is better not to skip these words in other projects.
We recommend regularly to expand the list of ignore words so that they do not distract you in further checks.
Note, that ignore list will be applied starting from the next crawl. We recommend using the special issue report ‘Spelling Mistakes Summary’ to select words quickly for the ignore list.
2.4. Enable the parameters in the sidebar
To check spelling, be sure to enable the "Spelling errors" parameter in the sidebar. If you do not enable it, the spell check will not be carried out. Next, you need to enable the parameters responsible for the text in which you want to check for spelling mistakes:
- Title — if the parameter is enabled, the text from the tag will be checked
- Description — if the parameter is enabled, the text from the tag will be checked
- H1 Headings — if the parameter is enabled, the text from the tags will be checked
- H2 Headings — if the parameter is enabled, the text from the tags will be checked
- H3 Headings — if the parameter is enabled, the text from the tags will be checked
- H4 Headings — if the parameter is enabled, the text from the tags will be checked
- H5 Headings — if the parameter is enabled, the text from the tags will be checked
- H6 Headings — if the parameter is enabled, the text from the tags will be checked
- Images — if the parameter is enabled, the text from the alt attributes of images will be checked
- Words — if the parameter is enabled, the text from the section will be checked
Such settings give you flexibility in checking: you can only check spelling in the text blocks you need. For example, on pages with UGC content, there is no way to correct user typos, so it makes sense to check for errors only in Title, Description and headings.
The ‘Spelling Mistakes’ parameter appears in the sidebar only if spell check is enabled in the crawling settings.
The more parameters are selected for the spell checker, the slower crawl will be.
The text in the section includes text from the headings and alt attributes of images.
3. Issues
If spelling mistakes are found during the crawl, the program will show an issue:
• Spelling mistakes
Shows URLs with misspelled words in one or several text sections (title, description, headings, images alt attributes, all text in section).
4. Table in the inner database
The ‘Spelling Mistakes’ parameter in the main table lists the number of mistakes found on a particular page with a link to the new table ‘Spelling Errors’ in the inner database.
The new table contains the found misspelled words, corrections for them, as well as the text block where the mistakes were found.
We recommend to group table by the "Word" parameter for the convenience of working with errors. To do this, drag the parameter title just above the table title.
Why does the program consider some correct words an error?
The program may consider some correct words as mistakes. This is because these words are not in the Windows language dictionaries. For example, these may be surnames or specialized words and terms. This is a normal situation for any spelling checker because languages change and it is impossible to create complete dictionaries.
Fortunately, it is possible to add custom Windows dictionaries. A word added to the dictionary will no longer be considered as a mistake in all programs on the computer that use the Windows Spell Checking API. Therefore, it is worth adding words there that you definitely do not want to consider a mistake in all programs.
Windows user dictionaries may be found in folder ’C:\Users{username}\AppData\Roaming\Microsoft\Spelling\{language-code}’ with ‘default.dic’ filenames. You can edit these files in any text editor.
General recommendation: add words to the user dictionary that you definitely do not want to consider a mistake in all programs on your computer.
Add words to the ignore list that should not be considered a mistake only in a specific project.
5. Reports
You can upload two new spelling reports using the ‘Export’ menu above the sidebar.
• ‘Export’ → ’Special issues reports’ → 'Spelling mistakes summary'
This summary contains the list of words with mistakes and examples of URLs with this words. It is useful for fast analysis of wrong words and selecting words for an ignore list.
• ‘Export’ → ‘XL (extra large) reports from database’ → ‘Spelling mistakes (XL)'
This is the export of the 'Spelling Mistakes' table from the inner database.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article