Can Deep Learning Models Predict Compositional Outputs Without Log-Ratio Transformations?

Event - ICTAI 2025

Abstract

Compositional data consist of components expressed as proportions of a whole and carry only relative information. In statistical and machine learning contexts, these data require specialized handling due to their constant-sum constraint and non-Euclidean geometry. A common approach is the application of log-ratio transformations—such as the centered log-ratio (CLR)—to project compositional vectors into Euclidean space. While using CLR for inputs is well established, applying this transformation to compositional outputs remains underexplored. This study evaluates the predictive impact of using CLR on target variables in supervised learning, with a geochemical dataset containing lithogeochemical targets and physical-log predictors. Three deep learning architectures are assessed: CNN-BiLSTM, SAIDNN, and MHA-BiRNN, each trained on raw and CLR-transformed outputs. Results consistently show that models trained directly on raw compositions outperform their CLR-transformed counterparts. We identify two causes: (i) the inverse CLR transformation redistributes prediction errors across components, reducing local precision, and (ii) zeros in compositional data introduce artifacts when preprocessed for log-ratio transformations. Furthermore, the models demonstrate good generalization on blind tests, especially when trained on raw data. These findings suggest that modern deep learning models can effectively learn from compositional outputs in their native form, avoiding distortions from transformation pipelines.

Abrir artigo

PORTAL DE INOVACAO.png

© 2024