Data-Based Modeling and Analysis | HEAL

Data-based Modeling and Analysis

In all our research projects we use data to parameterize models for optimization or simulation. In many of those cases, we do not have a mathematical model and it is infeasible to build a mathematical model. In these cases we can use machine learning or statistical methods to find purely data-based phenomenological models.

For example the process for the fabrication of high-quality steel strips is complex and has many sub-processes along the production chain. There are so many parameters along this chain which have affect the quality of the steel strip, that it is impossible to capture this in a physics-based mathematical model. However, we can use the huge amount of data that is measured in the process to find the main drivers for product quality and build a phenomenological model that we can use for predicting the quality. These purely data-based models can subsequently be used for solving the "inverse problem" of optimizing the main parameters of the production process e.g. to achieve highest qualities.

In the research area for data-based modeling and analysis we mainly use machine learning methods for the identification of predictive models. When physics-based models are available, we may also use those models and combine them with ML models. We heavily rely on visualization techniques to quickly explore data sets to find potentially useful patterns. Our main aim however is always to capture these patterns explicitly in the form of a mathematical model.

One of our specialities is symbolic regression and we are recognized as one of the leading groups for the development of symbolic regression methods worldwide. Symbolic regression is a supervised learning method which produces models as short mathematical expressions.

Symbolic Regression Example

We have more than 15 years of experience in developing customized machine learning algorithms for industrial applications. HEAL researchers all have a background in computer science, mathematics, software engineering, and data science enabling us to develop production-ready software for our partners. Some of our software is open source.

Milestones in the area of data-based modeling

The following list shows some important milestones that we have reached in the development of data-based modeling methods and functionality implemented in HeuristicLab.
  • 2021 - Shape-constrained symbolic regression for physics-informed regression implemented, published, and available in HeuristicLab
  • 2018 - Factor variables for symbolic regression implemented, published, and available in HeuristicLab
  • 2018 - Kernel ridge regression and Barnes-Hut tSNE available in HeuristicLab
  • 2016 - Sensitivity analysis for symbolic regression models. Model response plots / partial dependence plots available in HeuristicLab
  • 2016 - Elastic-net regularized linear regression (glmnet) available in HeuristicLab
  • 2015 - Multi-objective algorithms for symbolic regression to optimize model complexity and error (ParetoGP) available in HeuristicLab
  • 2015 - Gradient boosted trees for regression and classification available in HeuristicLab
  • 2013 - Memetic local optimization of numeric parameters via gradient-based trust-region algorithms and automatic differentiation for symbolic regression models implemented, published, and available in HeuristicLab
  • 2013 - Gaussian processes regression available in HeuristicLab
  • 2012 - Grammar-guided genetic programming for symbolic regression available in HeuristicLab
  • 2012 - Support vector regression (libSVM) available in HeuristicLab
  • 2011 - Automatic algebratic simplification of symbolic regression models in HeuristicLab
  • 2011 - Random forest regression available in HeuristicLab
  • 2010 - Complete re-implementation of symbolic regression and genetic programming for HeuristicLab 3.3
  • 2009 - Linear scaling for symbolic regression in HeuristicLab
  • 2008 - Complete re-implementation of symbolic regression and genetic programming for HeuristicLab 2.0
  • 2005 - Variable frequency analysis for estimating variable relevance in HeuristicLab
  • 2004 - First implementation of symbolic regression and genetic programming for HeuristicLab 1.0

Selected projects

Josef Ressel Center for Symbolic Regression

We develop new algorithms for symbolic regression and a methodological and technical framework for incremental model adaptation for handling concept drift with symbolic regression models.

TransMet - Fundamentals and tools for engineering of high quality recycled and CO2 reduced strip steels

The objective is to provide fundamentals and tools for the production of high-quality recycled and CO2-reduced strip steels. Based on two use cases, a demonstrator of a material and process design and optimization software based on hybrid modeling techniques for offline use in steel sheet production will be established. Hybrid modelling combines physics-based models with machine learning models. The main strategy is to use a combination of physical and data-driven modeling approaches for the development of process planning tools for steel strip production.

Machine Learning Methods for Identifying Features of Global Optimization Problems in the Non-Stationary Environment and for Automatic Adaptation of Evolutionary and Bio-Inspired Algorithms

K2 Competence Center: Symbiotic Mechatronics

Predictive Maintenance for Industry Radial Ventilators

Research focuses on investigating proactive maintenance strategies for radial fans. Therefore, the fans will be equipped with additional sensors that record vibration, acceleration, air pressure, temperature, flux, power consumption, rotation, abrasive wear, and deposit building, in order to monitor the system at any given moment. This installation enables Predictive and Condition-Based Maintenance, which considers the usage history, the current condition as well as the prospective operational load.

FlashCheck - Electric Arc Detection in DC Circuits

Deals with the interdependences between electrical arcing and the inverter electronics, different battery solutions to buffer electrical energy, as well as other influencing factors such as the cable length of the electrical wires. The result is a description of the common characteristics of electrical arcs in different system configurations and a general concept to detect arcing.

HOPL - Heuristic Optimization in Production and Logistics

In this project we developed novel algorithms for integratively optimizing interrelated logistics and production processes.

AI for lead generation

Out of a database of more than 20 million companies we recommend those that are most similar to existing customers.

Recycling 4.1

We use machine learning methods for model predictive control of plastics recycling plants

LIPOL - Learning in process optimization in the food industry

We develop machine learning methods for identifying dependencies between process parameters and product quality. The models shall be used for optimization of process parameters.

Selected publications


G. Kronberger, E. Kabliman, J. Kronsteiner, M. Kommenda. „Extending a Physics-Based Constitutive Model using Genetic Programming” (2021). arXiv preprint arXiv:2108.01595.

C. Haider, F. O. de França, B. Burlacu, G. Kronberger. „Using Shape Constraints for Improving Symbolic Regression Models” (2021). arXiv preprint arXiv:2107.09458.

L. Kammerer, G. Kronberger, S. Winkler. „Empirical analysis of variance for genetic programming based symbolic regression”. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO '21). Association for Computing Machinery, New York, NY, USA, pp. 251–252 (2021). doi:

W. La Cava, P. Orzechowski, B. Burlacu, F. Olivetti de França, M. Virgolin, Y. Jin, M. Kommenda, J. H. Moore. “Contemporary symbolic regression methods and their relative performance” (2021). arXiv preprint arXiv:2107.14351.

W. Roland, C. Marschik, M. Kommenda, A. Haghofer, S. Dorl, S. Winkler. "Predicting the Non-Linear Conveying Behavior in Single-Screw Extrusion: A Comparison of Various Data-Based Modeling Approaches used with CFD Simulations". International Polymer Processing. Vol. 36, Issue 5, pp. 529-544 (2021). doi:

L. Millán, G. Bokuchava, J.I. Hidalgo, et al. "Study of Microscopic Residual Stresses in an Extruded Aluminium Alloy Sample after Thermal Treatment". Journal of Surface Investigation: X-ray, Synchrotron and Neutron Techniques. 15, pp. 763–767 (2021). doi:"

E. Kabliman, A. H. Kolody, J. Kronsteiner, M. Kommenda, G. Kronberger. "Application of symbolic regression for constitutive modeling of plastic deformation”. In Applications in Engineering Science, Volume 6, 100052, Elsevier. (June 2021). doi:

G. Kronberger, F. O. de Franca, B. Burlacu, C. Haider, M. Kommenda. “Shape-constrained Symbolic Regression – Improving Extrapolation with Prior Knowledge”. In Evolutionary Computation (2021). doi:

L. Millán, G. Kronberger, J. I. Hidalgo, R. Fernández, O. Garnica, G. Gnzález-doncel. "Estimation of Grain-Level Residual Stresses in a Quenched Cylindrical Sample of Aluminum Alloy AA5083 Using Genetic Programming". In Applications of Evolutionary Computation (Conference Proceedings EvoApplications 2021), Vol. 12694, pp. 421.436, (2021). doi:

F. Bachinger, G. Kronberger, M. Affenzeller. “Continuous improvement and adaptation of predictive models in smart manufacturing and model management". In IET Collaborative Intelligent Manufacturing, Vol. 3, Iss. 1, Special Issue: Selected Papers from Collaborative and Intelligent Manufacturing in Industry 4.0 (ISM @SMM 2019), pp. 48-63, (March 2021). doi:


G. Kronberger, F. Bachinger, M. Affenzeller. "Smart Manufacturing and Continuous Improvement and Adaptation of Predictive Models". In Procedia Manufacturing, Volume 42. (2020). Part of special issue: International Conference on Industry 4.0 and Smart Manufacturing (ISM 2019). PDF

L. Kammerer, G. Kronberger, B. Burlacu, S. M. Winkler, M. Kommenda, M. Affenzeller. "Symbolic Regression by Exhaustive Search: Reducing the Search Space Using Syntactical Constraints and Efficient Semantic Structure Deduplication". In Genetic Programming Theory and Practice XVII., pp. 79-99. Springer. (2020). PDF

F. Bachinger, G. Kronberger. "Concept for a Technical Infrastructure for Management of Predictive Models in Industrial Applications". In Computer Aided Systems Theory - EUROCAST 2019, pp. 263-270. Springer. (2020). PDF

M. Affenzeller, B. Burlacu, V. Dorfer et al. "White Box vs. Black Box Modeling: On the Performance of Deep Learning, Random Forests, and Symbolic Regression in Solving Regression Problems". In Computer Aided Systems Theory - EUROCAST 2019, pp. 288-295. Springer. (2020). PDF

G. Kronberger, L. Kammerer, M. Kommenda. "Identification of Dynamical Systems Using Symbolic Regression". In Computer Aided Systems Theory - EUROCAST 2019, pp. 370-377. Springer. (2020). PDF

B. Burlacu, L. Kammerer, M. Affenzeller, G. Kronberger. "Hash-Based Tree Similarity and Simplification in Genetic Programming for Symbolic Regression". In Computer Aided Systems Theory - EUROCAST 2019, pp. 361-369. Springer. (2020). PDF

L. Kammerer, G. Kronberger, M. Kommenda. "Data Aggregation for Reducing Training Data in Symbolic Regression". In Computer Aided Systems Theory - EUROCAST 2019, pp. 378-386. Springer. (2020). PDF

J. Zenisek, G. Kronberger, J. Wolfartsberger, N. Wild, M. Affenzeller. "Concept Drift Detection with Variable Interaction Networks". In Computer Aided Systems Theory - EUROCAST 2019, pp. 296-303. Springer. (2020). PDF

G. Kronberger, J. M. Colmenar, S. M. Winkler, J. I. Hidalgo. "Multilayer analysis of population diversity in grammatical evolution for symbolic regression". In Soft Computing . Springer. (2020). PDF

B. Burlacu, G. Kronberger, M. Kommenda. "Operon C++: an efficient genetic programming framework for symbolic regression”. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion (GECCO ’20), pp. 1562–1570, (July 2020). PDF


W. Roland, M. Kommenda, C. Marschik, J. Miethlinger. "Extended Regression Models for Predicting the Pumping Capability and Viscous Dissipation of Two-Dimensional Flows in Single-Screw Extrusion". In Polymers. (February 2019). PDF

G. Kronberger, L. Kammerer, B. Burlacu, S. M. Winkler, M. Kommenda, M. Affenzeller. "Cluster Analysis of a Symbolic Regression Search Space". In Genetic Programming Theory and Practice XVI, pp. 85-102. Springer. (2019). PDF

E. Kabliman, A.H. Kolody, M. Kommenda, G. Kronberger. “Prediction of Stress-Strain Curves for Aluminium Alloys using Symbolic Regression”. In Proceedings of the 22nd International Conference on Material Forming (ESAFORM 2019), Vitoria-Gasteiz, Spain. AIP Conference Proceedings. (2019). PDF

B. Burlacu, M. Affenzeller, G. Kronberger, M. Kommenda. “Online Diversity Control in Symbolic Regression via a Fast Hash-based Tree Similarity Measure”. In Conference Proceedings of the IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand. (2019). PDF

B. Burlacu, M. Kommenda, G. Kronberger, M. Affenzeller. “Parsimony Measures in Multi-objective Genetic Programming for Symbolic Regression”. In Proceedings of the Genetic and Evolutionary Computation Conference 2019 (GECCO ’19). ACM, New York, NY, USA. (2019). PDF

G. Kronberger, L. Kammerer, M. Kommenda, B. Burlacu. “Visualisierung eines Suchraums für Symbolische Regression”. In Tagungsband Forschungsforum der Österreichischen Fachhochschulen (FFH 2019), Wiener Neustadt. (2019)

S. Prieschl, D. Girardi, G. Kronberger. “Using Ontologies to Express Prior Knowledge for Genetic Programming”. In Proceedings of Cross Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE) co-located with ARES 2019, Canterbury, UK, August 26-29, LNCS. (2019). PDF

M. Kommenda, B. Burlacu, G. Kronberger, et al. "Parameter identification for symbolic regression using nonlinear least squares". In Genetic Programming and Evolvable Machines. Springer. (2019). PDF

G. Kronberger, S. Scheidel, C. Haider, M. Kommenda, M. Kordon. "Integration of Physical Knowledge in Empirical Models - A New Approach to Regression Analysis". In Proceedings of the 8th International Symposium on Development Methodology, Wiesbaden, Germany pp. 1 - 9. (2019)

F. Bachinger, G. Kronberger. "Management von lernfähigen Vorhersage-Modellen für Industrie-Anwendungen". In Proceedings of The Future of Work, Education and Living, Linz, pp. 253-59. (2019).


B. Burlacu, M. Affenzeller. "Schema-based diversification in genetic programming". In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1111-1118. ACM. (July 2018). PDF

G. Kronberger, M. Kommenda, A. Promberger, F. Nickel. "Predicting friction system performance with symbolic regression and genetic programming with factor variables". In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1278-1285. ACM. (July 2018). PDF

F. Bachinger, J. Zenisek, L. Kammerer, M. Stimpfl, G. Kronberger. "Performance of industrial sensor data persistence in data vault". In Proceedings of the EMSS Conference, pp. 226-233. (September 2018). PDF