13. Tree-based models

Tree-based models are models such as Decision Trees, Random Forest, Extratrees, and so on. They are simple but very effective models that can handle high-dimensional non-linear problems. As such they are some of the most popular models found in emlearn.

For the general usage of tree ensembles, please refer to the ensemble documentation in scikit-learn. The documentation here covers the topics in using tree-based models that are specific to compute-constrained environments (microcontrollers and embedded devices), and the techniques that emlearn implements to optimize for such usage.

13.1. Basic usage

A trained tree-based model can be converted using cmodel = emlearn.convert(estimator), and the C code can be outputted using cmodel.save(file='mymodel.h').

One can then use the model in C code to perform inference, using the predict() function.

// header generated with emlearn
#include "mymodel.h"

// To get just the predicted class
const int cls = mymodel_predict(features, features_length);

For a complete example see XOR classification

13.2. Probability output

emlearn supports probabilities as outputs of tree-based models. This functionality is enabled by default, and generates a predict_proba() function.

// header generated with emlearn
#include "mymodel.h"

float probabilities[N_CLASSES];
mymodel_predict_proba(mymodel, features, features_length, probabilities, N_CLASSES)

The generated predict_proba() functions can be disabled by passing include_proba=False to save(). This can reduce the size of the code.

Note: probabilities will have higher resolution and be more accurate if enabling soft-voting (see below).

13.3. Regression

emlearn supports tree-based models also for regression. Usage is the same as for a classifier, the estimator being a regressor is deteted automatically.

In this case the generated predict() function returns a float instead of an integer.

// use "loadable" strategy for regression
const float out = mymodel_predict(mymodel, features, features_length)

13.4. Inference method: Inline vs loadable

emlearn supports two different strategies for inference of tree-based models. These are called inline and loadable.

The inference method is specified by passing the method argument can be passed to emlearn.convert(). For example emlearn.convert(estimator, method='inline').

The loadable option uses a EmlTrees data-structure to store the decision tree ensembles. In addition to using a model from the C code generated by emlearn, this also supports loading the model from a file or building the decision tree in memory on-device.

The inline option generates C code statements directly. Each tree is a series of if-else statements, and the merging of the results from multiple trees is also generated code. This code has no dependencies on emlearn headers.

In general, the inline strategy is expected to have the fastest execution time. However the exact impact on code space and execution time depends on the particular model, the target architecture and compiler options. So it may need to be tested for your particular application.

13.5. Feature representation

The default feature representation in emlearn trees is int16_t, 16-bit integers. This is a medium precision, which has been found to give practically identical performance as 32-bit floats for a wide range of datasets, while having several key benefits:

Reduces the program space and RAM space needed for the model
Avoids using floating-point code. Big benefit when there is no hardware FPU
Much faster on 8-bit and 16-bit microprocessor architectures

So by default you should ensure to scale your data to fit the range of int16 (-32,768 to +32,767).

The inline inference strategy also supports using floats, 32-bit integers or 8 bit integers. and set the dtype argument of emlearn.convert() to the appropriate C datatype. For example emlearn.convert(model, dtype=’int8’).

A complete example can be found in Feature data-types for trees.

13.6. Hard voting (majority) vs soft voting (proportions)

The results of individual decision trees for classification can be stored and combined in two different ways.

With hard (majority) voting, each tree stores in the leaf nodes a single class index, and the ensemble returns the majority vote (the most common class).

With soft (proportion) voting, each tree stores in the leaf nodes the class proportions (one value per class), and the ensemble returns the average over all the class proportions.

Storing only the class index (majority voting) gives the smallest model, but it may lead to lower predictive performance.

To use majority voting, pass leaf_bits=0 to emlearn.convert(). To enable soft voting, set the leaf_bits option to a value between 3 and 8. This determines the quantization applied to the class proportions.

13.7. Target quantization and leaf-deduplication

emlearn implements leaf de-duplication, such that identical leaves are only stored once across all trees. This can considerably reduce the storage needed for the model.

For majority-based voting in classifiers, the benefits of leaf-deduplication is automatic. This is because the leaf values is the index of different classes, which naturally a limited set.

For soft voting, the amount of deduplication is affected by the leaf quantization. Lower number for leaf_bits will give fewer unique values, and may reduce size of the model.

For regression, one needs to ensure that the targets are quantized to a small set of values. The best way to do this will be application specific.

13.8. Optimizing model complexity

The complexity of a tree-based ensemble is a function of its width (the number of trees) and depth of the trees. This influences both the predictive performance and computational costs of the model.

A larger model will generally have higher predictive performance, but need more CPU/RAM/storage. This leads to a trade-off, and different applications may chose different operating points.

Hyper-parameter tuning can be used to find a set of Pareto optimal model alternatives. This is illustrated in the example Optimizing model size for trees. A basic starting point is to use n_estimators=10, and tune the depth (using for example max_depth or min_samples_leaf).

13.9. Optimization of features

A high-performing and computationally efficient model is dependent on good input features.

Predictive performance of tree-based models is relatively robust against less-useful features. However they do tend to get used occationally, and may cause higher than-necessary computational costs (especially model size). Therefore it is good practice to remove features that are completely useless or redundant. This can be achieved with standard feature selection methods.

Creating new features using feature engineering can have a very large impact, and should always be considered in addition to optimizing the classifier. This tends to be very problem/task dependent and specific recommendations are outside the scope of this documentation. But for inspiration, see for example Energy-efficient activity recognition framework using wearable accelerometers where tree-based models outperform Convolutional Neural Networks.