Knowledge discovery from data is the process of uncovering useful knowledge from a data collection. Whereas public awareness of research in this space has focused more on machine learning theories and related statistical methods, qualitative approaches to data-driven knowledge exploration and discovery are founded on the study of algebraic structuresand mathematical logic. More specifically, algorithms defined on grounds of certain lattice structures and fragments of first-order logic provide the foundations for basic and applied research in the field of semantic knowledge exploration and discovery.
The application of these algorithms is also the focus of a collaborative research project within the departments of Computer Science and Information Science, overseen bythe SU/CSIR Research Chair in Artificial Intelligence, led byProf Arina Britz.
When users have limited knowledge of a domain or data set,it can be difficult to extract relevant information as they may not have sufficient information to generate a directed query.Furthermore, semi-structured data cannot be queried as data informal databases can, making the former difficult to aggregateand interpret directly.
This project sees researchers developing algorithms and software tool support for the qualitative exploration and visualisation of large, semi-structured data sets such as those found in social media data, online reviews, specialist reviews, crime report data, medical health records, etc.
The collaborative research project is focused on this exploratory search phase of knowledge extraction, which allows users to become familiar with the structure of a data set and to make useful observations on grounds of the data.
The project combines techniques used in formal concept analysis and semantic technologies to enable reasoning over the data set as well as the meaning of concepts and relationships in the domain. A semantic domain representation serves to improve the quality of the data which, in turn, can again improve the quality of the semantic domain representation. Furthermore, tool support for data exploration and visualisation, including tag cloud visualisation and map- and graph-based navigation, enables the discovery of knowledge from large semi-structured data sets.
The ConceptCloud tool was initially developed as part of the doctoral project of Gillian Greene, a former Centre for Artificial Intelligence Research (CAIR) student in the Department of Computer Science at SU. Since then, two CAIR master’s students have extended the underlying theory and provided corresponding tool support. This includes scaling ConceptCloud through conversion to a server-based architecture, by integrating semantic technologies to improve data quality, and through the addition of a mobile front end with map-based navigational support.
See the full document for more research from Stellenbosch University.