SDM 2016 Tutorial

Large Scale Hierarchical Classification: Foundations, Algorithms and Applications

Massive amount of available data in various forms such as text, image, and videos has mandated the need to provide a structured and organized view of the data to make it usable for data exploration and analysis. Hierarchical structure/taxonomies provides a natural and convenient way to organize information. Data organization using hierarchy has been extensively used in several domains - gene taxonomy for organizing gene sequences, DMOZ taxonomy for webpages, International patent classification hierarchy for browsing patent documents and ImageNet for indexing millions of images. Given, a hierarchy containing thousands of classes (or categories) and millions of instances (or examples), there is an essential need to develop an efficient and automated approaches to categorize unknown instances. This problem is referred to as Hierarchical Classification (HC) task. HC is an important machine learning problem that has been researched and explored extensively in the past few years.

In this tutorial, we will cover technical material related to large scale hierarchical classification. This will be meant for an audience with intermediate expertise in data mining having a background in classification (supervised learning). Formal definitions of hierarchical classification and variants will be discovered, along with a brief discussion on structured learning.

Tutorial slides


Huzefa Rangwala is an Associate Professor at the Department of Computer Science & Engineering, George Mason University. He received his Ph.D. in Computer Science from the University of Minnesota in the year 2008. His research interests include machine learning, learning analytics, bioinformatics and high performance computing. He is the recipient of the NSF Early Faculty Career Award in 2013, the 2014 GMU Teaching Excellence Award, the 2014 Mason Emerging Researcher Creator and Scholar Award, the 2013 Volgenau Outstanding Teaching Faculty Award, 2012 Computer Science Department Outstanding Teaching Faculty Award and 2011 Computer Science Department Outstanding Junior Researcher Award. His research is funded by NSF, NIH, NRL, DARPA, USDA and nVidia Corporation. The tutorial will present material from a combination of his own research as it relates to structured learning and multi-task learning. Recently, he developed a large scale hiearchical classifier using cost sensitive learning (HierCost) and the tutorial will discuss this publicly available package.

Azad Naik is currently a Ph.D. student in the Department of Computer Science & Engineering at George Mason University (GMU). Prior to joining GMU, he received his bachelor's of technology degree in Computer Science & Engineering from Indian School of Mines Dhanbad, India in 2009 and M.S. degree in Computer Science from the GMU in the year 2013. His research interests include hierarchical classification, multi-task learning and statistical pattern recognition. Currently, he is working on designing an effective methods for dealing with inconsistencies within the hierarchy.