08. What Are The Consequences of Not Integrating Domain Knowledge Into The Machine Learning Process?
I think if you don't integrate domain knowledge, it's easier to fall victim to a few different kinds of sort of traps. First of all, spurious correlations are easy to come by but maybe not recognize that they're spurious. And so something like the phase of the moon affecting your well productivity results is something that maybe might show up as a strong feature in the model when obviously we know that there's no causal link there. The other thing is it's difficult to know what features that we might be able to come up with out of the data. So some engineered features that we might take the raw data and say, if we look at it like this, that's probably physically relevant. Or that's maybe a good way to look at things that would be intuitive and interpretable to someone who's looking at the results of the model. If we don't have domain expertise, it's really hard to kind of come up with those features. There are automated feature engineering methods and algorithms out there, but they usually sort of just try to throw everything at the wall and see what sticks. And sometimes what sticks is, again, very unintuitive or just sort of not particularly meaningful to someone who's interpreting the results. And usually we can find better links to causality with domain knowledge. So if we're trying to understand what actually drives performance for a well or what is actually predictive, let's say of core porosity or something like that. With domain knowledge, we can sort of piece together what we know from the physical relationships and see if that's showing through in the model. Or if we think that some things are shining through in the model that maybe we strongly suspect are not causal, we might want to immediately go investigate those or filter those out to make sure we're not falling into a trap there. And really at the end of the day, it's also important and just the ability to convince stakeholders of the value of the analysis or the predictions generated from the models. So if our features and the whole process is rooted in domain knowledge, we can we can communicate the modeling process in those terms and we can link it back to some of the maybe first principles that are already widely accepted and make it much easier to get buy-in from management, let's say, in actually using this model to make decisions.