The Social Network Visualizer project has just released its latest version 1.6. This new version brings back the web crawler feature, which has been disabled in the 1.x series so far, but in a much more improved form.
To start the web crawler, go to menu Network > Web Crawler or press Shift+C…
A dialog will appear, where you must enter the initial web page (seed). You may also set the maximum nodes/pages (default 600) and what kind of links to crawl: internal, external or both. By default the spider will crawl both internal and external links.
The new web crawler is vastly improved from the 0.x releases and consists of two parts: a ‘spider’ and a ‘parser’, each one running on its own thread.
The spider visits a given initial URL (i.e. a website or a single webpage) and downloads its HTML code. The parser scans the downloaded code for ‘href’ links to other pages (internal or external) and adds them to a queue of URLs (called frontier).
As URLs are added in the queue, the spider visits them and downloads their HTML which is scanned for more links by the parser, and so on…
The process is multithreaded and completed in a matter of seconds even for 1000 urls.
The end result is the ‘network’ of all visited webpages as nodes and their real links as edges. To help you find some patterns right away, the nodes are by default displayed with their node sizes reflecting their outDegree.
From there, you can analyze the network using the SNA tools provided by SocNetV.
Please note that the parser searches for ‘href’ links only in the body section of the HTML code.
Binaries for Windows, Mac OS X and Linux are available from SocNetV’s Downloads area.