Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi folks,
Working on a first PowerBi project this weekend (having a blast with it!) but running into some gremlins, I have a few queries, but will seperate into different posts!
I am trying to cleanse / normalise around 40k rows of data that relate to sold products. An example mockup of the issue is below:
Steak | 32 Day Dry Aged | |
Ribeye Steak | Hereford | 32 Day Dry-Aged |
CentreCut Fillet | Longhorn 32 Days | |
Steak | 32 dayaged, Aberdeen Angus |
What is the best way to 'clean' this data? It currently coming from an excel sheet, but once the proof of concept works will be realtime data in an Azure SQL server or warehouse (and be a far larger data set).
In the example above I would want to be able to say '32 Day Dry-Aged' is the data I want, and '32 Day Dry Aged', '32 Days' and '32 dayaged' all mean the same thing. I would want to look for any of those values, and if they exist, create a new column, call it 'Maturation' and populate it accordingly.
I would then need to look in both columns for breed values (in this example Hereford & Longhorn), create column 'Breed' and populate accordingly.
In the real data, I have around 20 attributes (such as 32 Day Dry-Aged, 16 Day Dry-Aged etc) spread between attribute columns. There is no consistancy as to the column that an attribute will be found in, and some products have multiple attributes in the cell.
My current though process is around a method using DAX? But searching on 'data normalisation' or 'data cleansing' is not getting me very far.
Thanks in advance for any pointers!
Solved! Go to Solution.
You should definitely user Power Query and not DAX for this.
Since you have such random data it seems, what I would do is create a cross reference table. So find all possible values, using Grouping in Power Query and Load that in an Excel sheet. Then create a second column in the Excel sheet and type in what you want. For example:
Orig Data - New Data
32AgedData - 32 Days
32DayAged - 32 Days
etc. It will be tedious to get it all, but it will be far FAR easier than doing dozens/hundreds of IF/THEN type statements.
Now load this new table and merge with your source data, then use the "New Data" column as your values.
You'll need to periodically see if anything new has come up. Even typos, like 32AegdData. Add those to the table as it happens.
You should ultimately load this data into a new table in Azure SQL with the rest of the data, but to quickly get up and running, leaving it in Excel during development works fine.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingYou should definitely user Power Query and not DAX for this.
Since you have such random data it seems, what I would do is create a cross reference table. So find all possible values, using Grouping in Power Query and Load that in an Excel sheet. Then create a second column in the Excel sheet and type in what you want. For example:
Orig Data - New Data
32AgedData - 32 Days
32DayAged - 32 Days
etc. It will be tedious to get it all, but it will be far FAR easier than doing dozens/hundreds of IF/THEN type statements.
Now load this new table and merge with your source data, then use the "New Data" column as your values.
You'll need to periodically see if anything new has come up. Even typos, like 32AegdData. Add those to the table as it happens.
You should ultimately load this data into a new table in Azure SQL with the rest of the data, but to quickly get up and running, leaving it in Excel during development works fine.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingThanks Edhans - off to look at power query then!
Ok, Some progress, now I just need to work out how to follow instructions better! I have currently completed a bunch of data transofrmation in excel, and created lookup tables with distinct values - I have also removed any 'multivalue' splitting the values between the attribute columns.
This is now imported into BI and I am trying to merge this data. With the options I can see, I suspect I have misinterpreted the 'merge' instruction.
I am at this step:
Is there a turorial anywhere that would cover looking up the values in attributeOne, and Attribute Two (from ButcherAppImport), and then if a match is found between the attribute columns and the 'old attributes' (in attribute translations) populate the matchiing value from either 'maturation', 'range' or 'reared' in a new column in the original 'butcherAppImport Table (called Maturation, Range or Reared accordingly).
Cheers!
A couple of things I see that might help:
Make sense?
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingCovering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
115 | |
100 | |
88 | |
70 | |
61 |
User | Count |
---|---|
151 | |
120 | |
103 | |
87 | |
68 |