A Simple Introduction to CRISP-DM

Husni Nur Fadillah
Husni Nur Fadillah 3 min read
A Simple Introduction to CRISP-DM

Hello everyone! Today I'll explain the CRISP-DM process model, an essential framework for data science projects.

What is CRISP-DM?

Cross Industry Standard Process for Data Mining (CRISP-DM) is the de-facto industry-independent process model for data mining and applying data mining projects¹. This process model defines six phases that describe the complete data science lifecycle, from initial business understanding through to deployment.

The Six Phases of CRISP-DM

1. Business Understanding

This phase serves as the foundation for all subsequent phases. The primary goal is to establish a clear understanding of the problem and objectives that will guide all future work.

During this phase, we assess the business situation to gain an overview of available resources, constraints, and requirements. This understanding is critical because it informs all decisions made in the phases that follow.

2. Data Understanding

Before advancing to data preparation, we must thoroughly understand the data we have. This involves:

Without a clear understanding of our data, we cannot effectively move to the next phase and prepare it for modeling.

3. Data Preparation

Before data can be used for modeling, we must clean and prepare it so that it's ready to be consumed by machine learning algorithms. This is a critical phase because poor data quality directly leads to poor modeling results.

Data preparation involves activities such as handling missing values, removing outliers, feature engineering, and data transformation—all necessary steps to ensure data quality.

4. Modelling

The modeling phase consists of three main components:

It's important to document and explain your choices throughout this phase.

5. Evaluation

In the evaluation phase, we assess whether the model's results align with the business objectives defined in the first phase. Based on this assessment, we determine the next steps—whether to refine the model, try different techniques, or proceed to deployment.

6. Deployment

Once the model has been validated and approved, it's deployed so that stakeholders and customers can use it. The deployment phase consists of several key steps²:

Conclusion

CRISP-DM is a comprehensive process model for data mining consisting of six interconnected phases: business understanding, data understanding, data preparation, modelling, evaluation, and deployment. By following this framework, data science teams can ensure a structured, repeatable, and effective approach to their projects.

References

[1] Schroer Christoph Et al. 2021. "A Systematic Literature Review on Applying CRISP-DM Process Model" https://www.sciencedirect.com/science/article/pii/S1877050921002416

[2] Data Science Process Alliance. "CRISP DM" https://www.datascience-pm.com/crisp-dm-2/

Share this post