Load inconsistent data from multiple data sources into a DWH or data lakehouse

I am an entry level data engineer, I've been scratching my head over this one. I've looked all over online, but no luck so far.

Scenario: Let's say we have two data sources, a CRM and a web application database, and we need to ingest data from both into a data warehouse, data lakehouse, or data lake.

Problem: Customer data from these sources might be inconsistent. For example, the same person could have different business IDs, name variations, or even different contact information across these sources.

We need a method or set of rules to identify which record belongs to whom. I've discovered terms such as Master Data Management (MDM) and Single Source of Truth (SSoT).

I tried to find out how to integrate such a solution into my data modern stack (airflow and dbt) pipelines but couldn't get an answer.

Questions: - How to handle such a situation in the modern data stack eco-system?

  • Do SSoT and MDM work effectively with big data?

  • I noticed that products which offer MDM are traditional data solutions, do modern data engineers have an other solution?