| Did you know ... | Search Documentation: |
| Pack logtalk -- logtalk-3.98.0/library/isolation_forest/NOTES.md |
This file is part of Logtalk https://logtalk.org/ SPDX-FileCopyrightText: 1998-2026 Paulo Moura <pmoura@logtalk.org> SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
isolation_forestThis library implements the Extended Isolation Forest (EIF) algorithm for anomaly detection as described by Hariri et al. (2019). The Extended Isolation Forest improves upon the original Isolation Forest algorithm (Liu et al., 2008) by using random hyperplane cuts instead of axis-aligned cuts, eliminating bias artifacts in anomaly scores along coordinate axes.
The algorithm builds an ensemble of isolation trees (iTrees) by recursively partitioning data using random hyperplanes. Anomalous points, being few and different from normal points, require fewer partitions (shorter path lengths) to be isolated. The anomaly score for an instance is computed based on the average path length across all trees in the forest.
Datasets are represented as objects implementing the dataset_protocol
protocol from the classifier_protocols library. See the test_datasets
directory for examples.
Open the [../../apis/library_index.html#isolation-forest](../../apis/library_index.html#isolation-forest) link in a web browser.
To load all entities in this library, load the loader.lgt file:
| ?- logtalk_load(isolation_forest(loader)).
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(isolation_forest(tester)).
(x - p) * n =< 0 partitions that generalize to arbitrary
orientationsd - 1 (the default) use
fully extended random hyperplanes where d is the number of dimensionss(x) = 2^(-E(h(x)) / c(psi)) where E(h(x))` is the average path
length across all trees, c(psi) is the average path length of
unsuccessful searches in a BST, and psi is the subsample size100): number of isolation trees256 or number of instances if smaller):
subsample size for each treed - 1): controls the dimensionality
of the random hyperplane normal vectors0.5): threshold for anomaly
predictionTo learn an isolation forest model from a dataset with default options:
| ?- isolation_forest::learn(gaussian_anomalies, Model).
To learn with custom options:
learn(gaussian_anomalies, Model, [
number_of_trees(200),
subsample_size(128),
extension_level(1),
anomaly_threshold(0.6)
]).
To compute the anomaly score for a new instance:
| ?- isolation_forest::learn(gaussian_anomalies, Model),
isolation_forest::score(Model, [x-0.12, y-0.34], Score).
To predict whether an instance is an anomaly or normal:
| ?- isolation_forest::learn(gaussian_anomalies, Model),
isolation_forest::predict(Model, [x-4.50, y-4.20], Prediction).
To compute and rank anomaly scores for all instances in a dataset:
| ?- isolation_forest::learn(gaussian_anomalies, Model),
isolation_forest::score_all(gaussian_anomalies, Model, Scores).
The Scores list contains Id-Class-Score triples sorted by descending
anomaly score. This makes it easy to inspect top anomalies:
| ?- isolation_forest::learn(gaussian_anomalies, Model),
isolation_forest::score_all(gaussian_anomalies, Model, [Top1, Top2, Top3| _]).
To print a summary of the learned model:
| ?- isolation_forest::learn(gaussian_anomalies, Model),
isolation_forest::print_model(Model).
To use the original (non-extended) Isolation Forest, set the extension level to 0:
| ?- isolation_forest::learn(gaussian_anomalies, Model, [extension_level(0)]).