VOSON crawls a selection of "seed sites", chosen by the user for their particular project. VOSON crawls both inbound and outbound hyperlinks and forms a network from those sites which are connected with each other. VOSON is careful to respect those websites which request that automated 'robots' do not crawl their site. In addition, VOSON intelligently decides how many links to follow and how deep links should go before stopping the crawl. Crawls are accounted for using VOSON Activity Units (VAU), which are available in different quantities from the pricing page.
Based upon the webmining parameters set by the user, the crawl will use the seed sites as points of departure to traverse both links from each site (outbound links) and links to each site (inbound links). These links will discover other sites not included in the seed sites, and will add them to the network. By repeating this process the network grows until the resources devoted to the crawl are exhausted, or the webmining parameter criteria for stopping the crawl are fulfilled, whichever comes first. The resulting network of sites is saved into a VOSON database for further data manipulation and visualisation.
Not every crawl generates a network—it may be, for example, that websites provided as 'seed sites' are not connected to each other via hyperlinks. Of course, since it is the user who is selecting seed sites there is usually an a priori expectation of some connection between seeds. On the other hand, sometimes a network is generated which is not particularly useful. For example, it may be that two seed sites are only connected via e.g. Facebook or Twitter, i.e. they have (usually outbound) links to one of these sites and hence are 'connected'. But this is generally viewed as a spurious connection, and the user can remove such connections by 'pruning' the network results.
Please see VOSON's Services page for access to the VOSON User's Guide, which will describe in detail what happens when a crawl is run (note that you must be a registered user to access the User's Guide).
VOSON doesn't automatically perform snowball crawls, because the network would get too big too quickly and a lot of the new sites would be rubbish (e.g. from crawls of adobe.com). But users can perform a manual snowball approach: input some seed sites and get them crawled - then use the composition crosstab function to create a subnetwork containing only the seeds plus "important" sites, where important sites are those with a degree of greater than two (or higher if you like) i.e. they connect to two or more seeds (or else have a reciprocal relationship with one seed). These important new sites are likely to be good candidates for new seed sites. You then open up the original voson database and add these new seeds and then let the crawl happen again. You can keep on adding new seeds, in this way, thus doing a (manual) snowball data collection.
From January 1, 2017, inbound link collection ability was removed for Free Tier users, as that action consumes data from an external provider. Please check our different plans to suit diverse crawling needs.