lleaves 🍃

Contents:

class lleaves.Model(model_file)

The base class of lleaves.

__init__(model_file)

Initialize the uncompiled model.

Parameters:: model_file – Path to the model.txt. Hint: If you have the string representation of the model, you can use tempfile from the standard library to write the string to a file first.

compile(cache=None, *, raw_score=False, fblocksize=34, fcodemodel='large', finline=True, froot_func_name='forest_root', use_fp64=True, target_cpu=None, target_cpu_features=None)

Generate the LLVM IR for this model and compile it to ASM.

For most users tweaking the compilation flags (fcodemodel, fblocksize, finline) will be unnecessary as the default configuration is already very fast. Modifying the flags is useful only if you’re trying to squeeze out the last few percent of performance.

Parameters:

cache – Path to a cache file. If this path doesn’t exist, binary will be dumped at path after compilation. If path exists, binary will be loaded and compilation skipped. No effort is made to check staleness / consistency.
raw_score – If true, compile the tree to always return raw predictions, without applying the objective function. Equivalent to the raw_score parameter of LightGBM’s Booster.predict().
fblocksize – Trees are cache-blocked into blocks of this size, reducing the icache miss-rate. For deep trees or small caches a lower blocksize is better. For single-row predictions cache-blocking adds overhead, set fblocksize=Model.num_trees() to disable it.
fcodemodel – The LLVM codemodel. Relates to the maximum offsets that may appear in an ASM instruction. One of {“small”, “large”}. The small codemodel will give speedups for most forests, but will segfault when used for compiling very large forests.
finline – Whether or not to inline function. Setting this to False will speed-up compilation time significantly but will slow down prediction.
froot_func_name – Name of entry point function in the compiled binary. This is the function to link when writing a C function wrapper. Defaults to “forest_root”.
use_fp64 – If true, compile the model to use fp64 (double) precision, else use fp32 (float).
target_cpu – An optional string specifying the target CPU name to specialize for (defaults to the host’s cpu name).
target_cpu_features – An optional string specifying the target CPU features to enable (defaults to the host’s CPU features).

num_feature(): Returns the number of features used by this model.

num_model_per_iteration()

Returns the number of models per iteration.

This is equal to the number of classes for multiclass models, else will be 1.

num_trees(): Returns the number of trees in this model.

predict(data, n_jobs=None)

Return predictions for the given data.

The model needs to be compiled before prediction.

Parameters:

data – Pandas df, numpy 2D array or Python list. Shape should be (n_rows, model.num_feature()). If the datatype is not equal to the model’s dtype, the data will be copied. In any case access is read-only.
n_jobs – Number of threads to use for prediction. Defaults to number of CPUs. For single-row prediction this should be set to 1.

Returns:

1D numpy array. Datatype is fp64/fp32, depending on the use_fp64 flag passed to .compile()