|
Knowledge Discovery for Classification with Mixed Spatial
Data Types - A Rough Set Approach
Principal Investigator:
Leung Yee
Co-investigator(s):
Manfred M. Fischer, Zhang Wenxiu
Summary:
Classification has long been a corner stone in spatial analysis.
Its significance becomes even more prominent with the availability
of large volume of geo-referenced data captured in geographic
information systems (GIS) and remotely sensed images. Being
able to discover non-trivial, previously unknown and potentially
useful knowledge from a data set for a specific classification
task, such as land covers in hyperspectral images, is thus
of great importance for real-life applications. Methods such
as statistics and fuzzy sets have to rely on external parameters
and prior model assumptions, e.g. probability distributions
in statistics and membership functions in fuzzy sets. Rough
set, on the other hand, only uses internal knowledge embedded
in a raw information system to discover classification rules.
Out of all features (attributes) employed for a classification,
rough set models can automatically select the minimal set
of features necessary and sufficient for a classification
task. It is especially instrumental in hyperspectral analysis
where a very large number of spectral bands is employed for
image analysis. Through the process of knowledge reduction,
the rough set approach can also discover the optimal set of
rules. This can sharpen our knowledge and reduce the dimension
and complexity of a classification task.
In this research, we will develop novel rough set models
capable of discovering knowledge in (1) purely qualitative,
(2) purely quantitative, and (3) mixed spatial databases.
The approach generalizes existing rough set models and will
advance the research frontier of rough set in general and
spatial data mining and knowledge discovery in particular.
To validate and evaluate, we will develop efficient algorithms
for the implementation of the proposed rough set models. A
real-life application in hyperspectral classification with
mixed data types will be made for substantiation and assessment.
|