I am currently conducting a research where I aim to predict the selling price and deal execution time of properties in NY. To do this, I have a dataset of various properties sold in NY, containing several features regarding the properties’ characteristics.
What I am trying to achieve is a multi-output regression problem. I was discussing this with my professor, and in the final protocol, he asked me to explain why predicting the deal execution time could be problematic.
Here’s what I’ve thought about, and I’d like to know if it makes sense to you or if there are other problems I haven’t considered:
The main issue, in my opinion, is assessing the accuracy of the results. The problem lies in the fact that the deal execution time could be influenced by countless unquantifiable external factors not present in the dataset. For example, the proficiency of the real estate agent, their connections in the real estate sector, their determination to sell the property, weather conditions, the buyer’s experience, and more.
Due to this, multiple values of the deal execution time could exhibit the same feature patterns. Hence, we are dealing with a multi-label regression problem. Creating models tasked with predicting the deal execution time as accurately as possible could therefore hide some difficulties.
Are there any other mathematical or ML optimization factors I should consider?