2. Pareto-optimal evaluation

emlearn.evaluate.pareto.find_pareto_front(df, cost_metric: str = 'mean_test_compute', performance_metric: str = 'mean_test_accuracy', higher_is_better: bool = True, min_performance=None)[source]

Find the Pareto front

Parameters:
  • cost_metric – Column with model compute cost. Lower cost always better

  • performance_metric – Column with model predictive performance.

  • higher_is_better – Whether higher or lower is better for @performance_metric

  • min_performance – Cut datapoints with worse performance than this

Returns:

The rows that make up the Pareto front

emlearn.evaluate.pareto.is_pareto_efficient_simple(costs)[source]

Find the pareto-efficient points (smaller is better)

Parameters:

costs – An (n_points, n_costs) array

Returns:

A (n_points, ) boolean array, indicating whether each point is Pareto efficient

From https://stackoverflow.com/a/40239615/1967571 Fairly fast for many datapoints, less fast for many costs, somewhat readable

emlearn.evaluate.pareto.plot_pareto_front(results, pareto_cut=None, plot_other=True, plot_pareto=True, hue=None, pareto_alpha=0.8, other_alpha=0.3, pareto_global=False, s=100, pareto_s=5, height=8, aspect=1, cost_metric='mean_test_compute', performance_metric='mean_test_accuracy', size_metric='mean_test_size')[source]

Utility for plotting performance vs compute cost and size of a model.

Can also compute and plot the pareto front.

3. Tree evaluation metrics

Convert a Python model into C code

emlearn.evaluate.trees.compute_cost_estimate(model, X, b=None)[source]

Make an estimate of the compute cost, using the following assumptions:

  • The dataset X is representative of the typical dataset

  • Cost is proportional to the number of decision node evaluation in a tree

  • The cost is added across all trees in the ensemble

Under this model, the actual compute time can be computed as the estimate times a constant C, representing the time a single evaluation of a decision node takes.

emlearn.evaluate.trees.count_trees(model, a=None, b=None)[source]

Number of trees in model

emlearn.evaluate.trees.get_tree_estimators(estimator)[source]

Get the DecisionTree instances from ensembles or single-tree models

emlearn.evaluate.trees.model_size_bytes(model, a=None, b=None, node_size=None)[source]

Size of model, in bytes

emlearn.evaluate.trees.model_size_nodes(model, a=None, b=None)[source]

Size of model, in number of decision nodes

emlearn.evaluate.trees.tree_depth_average(model, a=None, b=None)[source]

Average depth of model

emlearn.evaluate.trees.tree_depth_difference(model, a=None, b=None)[source]

Measures how much variation there is in tree depths

Tools for getting the usage of program space (FLASH) and memory (SRAM).

emlearn.evaluate.size.check_build_tools(platform: str)[source]

Check whether the build tools for specified platform is available

Returns the set of tools that are missing (if any)

emlearn.evaluate.size.get_program_size(code: str, platform: str, mcu: str, include_dirs=None) -> (<class 'int'>, <class 'int'>)[source]

Determine program size when program is compiled for a particular platform

Returns the FLASH and RAM sizes

emlearn.evaluate.size.parse_binutils_size_a_output(stdout: str) Dict[str, int][source]

Parse the output of GNU binutils program size, with the option -A

Outputs are in bytes

emlearn.evaluate.size.run_binutils_size(elf_file: Path, binary: str) Dict[str, int][source]

Get the size of the “program” and “data” sections, using the “size” command-line tool from GNU binutils