Text Retrieval Engine

dtSearch™ Spider

• dtSearch™ Desktop and Network contain a built-in Web Spider for indexing and searching publicly-accessible Web sites.

•The dtSearch™ Spider automatically recognizes and supports HTML, PDF, XML, as well as other online text documents, like word processor files and spreadsheets.

•dtSearch™ Desktop and Network will display Web pages and documents that the Spider finds with highlighted hits as well as (for HTML and PDF) links and images intact.

How the dtSearch™ Web Spider Works

Indexing a Web SiteTo index a Web site, select "Add web" in the dialog box below.

Enter the name of the Web site, for example, www.ccra-adrc.gc.ca. Then select the crawl depth. The crawl depth is the number of levels into the web site dtSearch™ will reach when looking for pages. You could spider www.ccra-adrc.gc.ca to a crawl depth of 1 to reach only pages on the site linked directly to the home page. Or you could enter a crawl depth of 4 to reach four levels deep into the site.


After a search, dtSearch™ Spider will display retrieved HTML or PDF files with hit highlighting, and all links and images intact. The result looks and acts just like the original Web page, but with highlighted hits and additional navigation options ("next hit," "previous document," "next documents," etc.).

HTML file retrieved by dtSearch™ Spider

dtSearch™ uses built-in HTML file converters to convert other text formats, such as word processor and spreadsheet, to HTML for display with highlighted hits. See Fields for special XML search options.

Technical Note

The dtSearch™ Spider does not "capture" an indexed Web sites. To display a file indexed with the dtSearch™ Spider, dtSearch™ will return to the Web site to access the document.

