After a crawl has been completed and a VOSON database created, sites and their links are categorised into page groups (i.e. links which have the same top-level domain). This provides the most accurate representation of the resulting network, as URLs from the same domain are treated as one network “node”. The VOSON Analysis database is the resulting network when sites are aggregated according to page group. Because the VOSON Analysis database is the most accurate network representation, all crosstabulation, text analysis and network visualisation tools are applicable only to these databases. The original VOSON database, while remaining as the underlying 'raw' data, is not directly analysed using these tools. VOSON Analysis databases are automatically created following a crawl, with the suffix “AN” appended to the end of the chosen name for the VOSON database. Both types of databases are available for selection from the “Show Databases” menu item, available from the “Data” menu.
Categorical variables are a way of classifying the websites that are found after a crawl. For example, a simple categorical variable is the generic top-level domain, e.g. “.com”, “.org”, “.edu”, etc. Websites can be classified according to this and many other automatically-created categories. But where VOSON really shines is in its ability to allow the user to create entirely new categories to classify websites, via the “Coding” tab (available under the “Preferences” menu). These categories are normally what the user's expertise tells them are important to the reason the network is being created in the first place, and may not be one of the auto-generated categories VOSON creates. By adding categorical variables, the user can sort and manipulate the network found by the crawl in new and informative ways. Changes made show up immediately in the DataBrowser.
Crosstabulation, or “crosstabs” for short, is a way to compute how similar various categorisations of websites are to each other within the network. It is an analysis technique which is very useful for finding out information about categorical variables (such as number of occurrences) and about how such variables are related to other variables within the network.
Composition crosstabulation can be performed by selecting the menu item “Composition” from the “Crosstabs” submenu of the “Analysis” menu. Composition allows the user to find out occurrence information for any categorical variable. For example, Composition can show the total number of inbound links across top-level domains. Or a breakdown of the degrees (inbound, outbound and total) across network components. Such relationships can be computed across the network as a whole, or only for a selected subnetwork chosen by the user. Composition is an extremely powerful tool for understanding how different categorical variables relate to each other.
VOSON collects and analyses text appearing on sites in the network, and creates list of text terms according to frequency and co-occurrence. These terms can, just as with Composition crosstabulation, be compared across different sites and categories.