SDM 2016 Tutorial
Large Scale Hierarchical Classification: Foundations, Algorithms and Applications
Abstract
Massive amount of available data in various forms such as text, image, and videos has mandated the need to
provide a structured and organized view of the data to make it usable for data exploration and analysis. Hierarchical structure/taxonomies provides a natural and convenient way to organize information. Data organization
using hierarchy has been extensively used in several domains - gene taxonomy for organizing gene sequences,
DMOZ taxonomy for webpages, International patent classification hierarchy for browsing patent documents and
ImageNet for indexing millions of images. Given, a hierarchy containing thousands of classes (or categories) and
millions of instances (or examples), there is an essential need to develop an efficient and automated approaches
to categorize unknown instances. This problem is referred to as Hierarchical Classification (HC) task. HC is an
important machine learning problem that has been researched and explored extensively in the past few years.
In this tutorial, we will cover technical material related to large scale hierarchical classification. This will be meant for an
audience with intermediate expertise in data mining having a background in classification (supervised learning).
Formal definitions of hierarchical classification and variants will be discovered, along with a brief discussion on
structured learning.
Presenters
Huzefa Rangwala is an Associate Professor at the Department of Computer Science
& Engineering, George Mason University. He received his Ph.D. in Computer Science from the University of Minnesota in the year 2008. His research interests include machine learning, learning analytics, bioinformatics and
high performance computing. He is the recipient of the NSF Early Faculty Career Award in 2013, the 2014 GMU
Teaching Excellence Award, the 2014 Mason Emerging Researcher Creator and Scholar Award, the 2013 Volgenau Outstanding Teaching Faculty Award, 2012 Computer Science Department Outstanding Teaching Faculty
Award and 2011 Computer Science Department Outstanding Junior Researcher Award. His research is funded
by NSF, NIH, NRL, DARPA, USDA and nVidia Corporation. The tutorial will present material from a combination
of his own research as it relates to structured learning and multi-task learning. Recently, he developed a large
scale hiearchical classifier using cost sensitive learning (HierCost) and the tutorial will discuss this publicly available
package.
Azad Naik is currently a Ph.D. student in the Department of Computer Science & Engineering at
George Mason University (GMU). Prior to joining GMU, he received his bachelor's of technology degree in Computer Science & Engineering from Indian School of Mines Dhanbad, India in 2009 and M.S. degree in Computer
Science from the GMU in the year 2013. His research interests include hierarchical classification, multi-task learning and statistical pattern recognition. Currently, he is working on designing an effective methods for dealing with
inconsistencies within the hierarchy.