Research Article
Óscar Marbán and
Abstract
Existing Data Mining process models propose one way or another of developing projects in a structured manner, trying to reduce their complexity through effective project management. It is well-known in any engineering environment that one of the management tasks that helps to reduce project problems is systematic project documentation, but few of the existing Data Mining processes propose their documentation. Furthermore, these few remark the need of producing documentation at each phase as an input for the next, but they don’t show how to do it. On the other hand, in the literature there are examples of UML extensions for data mining projects, but they always focus on the model implementation side and fail to take into account the remainder of the process. In this paper, we present an extension of the UML modeling language for data mining projects (DM-UML) covering all the documentation needs for a project conforming to a standard process, namely CRISP-DM, ranging from business understanding to deployment. We also show an example of a real application of the proposed DM-UML modeling. The result of this approach is that, besides the advantages of having an standardized way of producing the documentation, it clearly constitutes a very useful and transparent tool for modeling and connecting the business understanding or modeling phase with the remainder of the project right through to deployment, as well as a way of facilitating the communication with the nontechnical stakeholders involved in the project, problems which have always been an open question in data mining.