Best Practices for Product Master Data Preparation

Product data often holds large number of attributes and which usually need to be segregated to get meaningful insights about them. Also, as the enterprises today have global presence, there are requirements for localization: support for multiple languages, currencies, units-of-measure etc.

In order to publish consistent product information across the organisation and distribution channels, information must first be acquired and prepared to keep it accurate and up-to-date.

The goal of product master data preparation is to ensure that the data is consistent and of high quality. Inconsistent, low quality data can create errors, slow down time-to-market and a poor customer experience.

We at Onedot believe product master data preparation is ideally conducted in these 8 steps

  1. Data Ingestion
  2. Attribute Extraction
  3. Product Categorisation
  4. Schema Mapping
  5. Data Integration
  6. Attribute Normalisation
  7. Golden Record Generation
  8. Product Variant Identification

First of all, transform all your input files to a flat table with a unified schema as base for all further data preparation.

Second, extract additional attributes out of semi-structured information such as product titles, product descriptions to gain more structured information out of your available data and further improve your ability to feed search filters, correctly categorise products and identify product variants.

Before further proceeding with data integration, we recommend to categorise your products since schema mapping, attribute normalisation and product variant building are often category-specific. Ideally, you are able to automatically map all your products to the most likely target category with each mapping including a confidence level as base for manual validation.

Next, map all input attributes to your target attributes. Attributes not covered by the target structure are recommended for separate listing as base to potentially adapt the target schema and to avoid information loss.

Once you are content with your schema mapping, integrate the individual products from your input data by mapping your attribute values either as-is or where applicable to predefined values. This results in a unified dataset that you could already use to fill your PIM.

There’s however 3 more steps we recommend you to look into:

  1. You usually want to normalise and convert your attribute values to enable great search filters for your webshop to drive conversion rate and improve search engine optimisation and marketing.
  2. You might have data from multiple input sources for identical products. In this case, you should aggregate the available data to one unified (“golden”) record using a dedicated strategy such as prioritising one source above others, resulting in a unified, de-duplicated set of your product master data.
  3. You might want to build product variants based on certain attributes such as colour, size etc. to avoid displaying 95% identical products multiple times on your website as individual listings.


Product master data preparation at scale can be a daunting task requiring a lot of manual work over long periods of time with countless iterations and feedback loops. But choosing the right solution that is easy to configure, highly customisable and at the same time scalable, can have huge effects on revenue and increasing profit. It will help in speeding up the data management process and cut costs.

Be(come) faster and better at reduced operational costs by applying a structured best practice process such as the above in combination with a state-of-the-art AI-powered data preparation service.