Usage |
python train.py [-h] -d DATASET -f
FEATURES -t HIERARCHY -m MODEL_DIR [-c COST_TYPE] [-r RHO] [-u] [-i] [-n NODES] |
Input/Options |
-h, --help
print help and exit.
-d DATASET, --dataset DATASET
Location of the training dataset file
in LibSVM format (see file formats).
-f FEATURES, --features FEATURES
Integer value representing the number of
training features.
-t HIERARCHY, --hierarchy HIERARCHY
Hierarchy in edge-list format (see file formats).
-m MODEL_DIR, --model_dir MODEL_DIR
Directory/Folder where the model output
files are saved. Any existing files will be overwritten.
-c COST_TYPE, --cost_type COST_TYPE
Cost type refers to the different
strategies for deriving costs based on the hierarchy.
Valid values for COST_TYPE are
-r RHO, --rho RHO
lr -- Standard Logistic Regression ( Default ) trd -- Tree Distance nca -- Number of Common Ancestors etrd -- Exponetiated Tree Distance See Reference for further explanation. Value of regularization parameter, which
should be a positive floating point value. Default = 1.
-u, --multi
Train models for mult-label
classification. Default is single-label classification.
-i, --imbalance
Include imalance costs
-n NODES, --nodes NODESSee Reference for further explanation. Comma separated list of training nodes
(No space around commas). By default models are trained
for all the leaf nodes.
E.g. -n 2,1,33 |
Output |
For each node in the hierarchy for
which a model is trained, the program outputs a model to a
file with the name <node_id>.p in the
directory provided by the file system path MODEL_DIR. |
Usage |
python predict.py [-h] -d DATASET -f
FEATURES -t HIERARCHY -m MODEL_DIR [-u] -p PRED_PATH |
Input/Options |
-h, --help
print help and exit.
-d DATASET, --dataset DATASET
File location of the training dataset
file in LibSVM format (see file
formats).
-f FEATURES, --features FEATURES
Integer value representing the number of
training features.
-t HIERARCHY, --hierarchy HIERARCHY
Hierarchy in edge-list format (see file formats).
-m MODEL_DIR, --model_dir MODEL_DIR
Directory/Folder where the model output
files are saved from the training script.
-u, --multi
Type of training (single-label or
multi-label) used in training. Must match training.
-p PRED_PATH, --pred_path PRED_PATH
File location for the predicted output
(see
file formats).
|
DATASET |
The dataset for training/testing should
be provided in libsvm
format. With multi-label dataset, option -u must
be set. Format for input file:
<label1,lable2,...> <index1>:<value1> <index2>:<value2> ... Example for input file
(Single Label):
1 1:0.01 2:1.5 3:1.25 Example for input file (Multi
Label):
1,2 1:0.01 2:1.5 3:1.25 |
HIERARCHY |
Hierarchy is a text file representing the
hierarchy in edge-list format. Each line of the file
represents an edge between a parent and child node. Format for hierarchy:
parent_node_id child_node_id Example for hierarchy:
0 1 |
PREDICTIONS |
Predictions are saved in a text file.
Each line contains to the predicted labels for the
corresponding instance from the test data set. Example for single-label
prediction:
1 Example for multi-label
prediction:
1 |
In citing hierCost in your papers, please use the following reference:
The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions.
As unestablished research software, this code is provided on an ``as is'' basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.
Funding Provided by NSF Grants IIS 1252318 and 0905117