
|
Computer Science Department Seminars
2003-2004 Academic Year Text Mining: Exploring Ideas using Text Collections
Padmini Srinivasan
School of Library and Information Science,
Department of Management Sciences
The University of Iowa
Iowa City, IA, 52242
padmini-srinivasan@uiowa.edu
Hypothesis generation, a crucial initial step for making scientific discoveries,
relies on prior knowledge, experience and intuition. Connections made
serendipitously between seemingly distinct subareas sometimes turn out to
be fruitful. The goal in text mining is to assist in this process by
automatically discovering a small set of interesting hypotheses from a
suitable text collection. In this talk we present our research on text
mining algorithms and highlight some of the challenges that we face. Our aim is
to explore functions and capabilities to support text based knowledge discovery. We
seek to design domain independent methods that may be applied to a variety
of problem contexts. Our overall goal is to build a working text mining system
while also investigating research questions related to such efforts. We have used our
system to explore for example, the global distribution of disease research and
their correlation with the prevalence of these diseases. Interesting trends in
disease research were identified. The application area that will be
emphasized in this talk is the mining of relationships between concepts
such as genes and diseases in the bioscience domain - a specialized text
mining problem that has been recently termed `conceptual biology'. We
will also present our experiments that are designed with this theme.
|