Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
The heading might be a bit misleading, but I can't really think of another way to put it.
I have a huge dataset, all housed on Sharepoint, that is horribly structured. It is comprised of hundreds of Excel spreadsheets (multiple tabs per file). Every sheet/page is uniform in design, but the data is set out for aesthetic, 'form filling' reasons rather than data reporting. This means that data transformation is a nightmare. I can't change the data collection process as that's a different department and out of my hands (though I have suggested alterations that could be made).
After multiple specific steps of 'Remove Alternate Rows' and Field Recording to capture single values that appear above banks of data, I've got it to a semi-structured point, but am now stuck.
Essentially, after I've pulled it apart and stuck it back together, the data I'm stuck with looks a bit like this (the below is not representative of the collected data, just the structure):
Column1 | Column2 | Column3 | Column4 | Column5 | Column6 | Column7 | Column8 |
Name | null | null | Age | null | Gender | null | null |
John | null | null | 34 | null | Male | null | null |
Jenny | null | null | 35 | null | Female | null | null |
Dave | null | null | 21 | null | Male | null | null |
Polly | null | null | 28 | null | Female | null | null |
Departments | Building No. | null | null | Postcode | null | No. of Staff | null |
IT | 1 | null | null | PP1 1PP | null | 14 | null |
Finance | 2 | null | null | QQ3 3QQ | null | 12 | null |
Health | 3 | null | null | TT7 7TT | null | 5 | null |
Vehicles | No. | null | Vehicle Costs | null | null | null | Budget Code |
Transit Van | 3 | null | £2500 | null | null | null | BC1234 |
Car | 5 | null | £1200 | null | null | null | BC4567 |
The problem is that the files have merged cells which PBI doesn't like.
As this data all pertains to a specific business name which was gathered using the aforementioned Field Recording, I need to take the blocks of data containing Departments and Vehicles and convert them to columns so they sit alongside the top data block (Names). I can then delete blank columns, lift first row entries to columns headers, and wind up with something like this:
Name | Age | Gender | Departments | Building No. | Postcode | No. of Staff | Vehicles Type | Vehicle No. | Vehicle Costs | Budget Code |
John | 34 | Male | IT | 1 | PP1 1PP | 14 | Transit Van | 3 | £2,500 | BC1234 |
Jenny | 35 | Female | Finance | 2 | QQ3 3QQ | 12 | Car | 5 | £1,200 | BC4567 |
Dave | 21 | Male | Health | 3 | TT7 7TT | 5 | null | null | null | null |
Polly | 28 | Female | null | null | null | null | null | null | null | null |
Polly | 28 | Female | null | null | null | null | null | null | null | null |
This I can then work with.
Thanks to the lovely people on here teaching me how (specifically @parry2k), I'm applying the same coding to hundreds of files from a folder, which'll take ages, but will do the job in a single refresh.
Any ideas? I feel like I'm missing something obvious.
Solved! Go to Solution.
Hi @Mat42 ,
According to your needs, you can achieve the results you expect in Power Query.
I did the following test, which can be used as a reference: "Reference" a new table in the query, and then do the following steps
Then create the index column and merge the table through the index column.
Finally put the corresponding lot on the table visual to get the desired result
Here is a relative link for referene:Solved: Filter rows in Query editor - Microsoft Power BI Community
If the problem is still not resolved, please provide detailed error information and let me know immediately. Looking forward to your reply.
Best Regards,
Henry
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks guys. All fixed and sorted.
One day I'll get to grips with this system.
Hi @Mat42 ,
According to your needs, you can achieve the results you expect in Power Query.
I did the following test, which can be used as a reference: "Reference" a new table in the query, and then do the following steps
Then create the index column and merge the table through the index column.
Finally put the corresponding lot on the table visual to get the desired result
Here is a relative link for referene:Solved: Filter rows in Query editor - Microsoft Power BI Community
If the problem is still not resolved, please provide detailed error information and let me know immediately. Looking forward to your reply.
Best Regards,
Henry
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@Mat42 I think I got the gist. The solution is attached, basically, I'm taking 3 blocks into separate tables and merging those together. if for some reason, performance is an issue because of merging, we can completely ignore it and create a relationship between tables but let's start with this. Test it and let me know if this is what you are looking for.
Check my latest blog post Compare Budgeted Scenarios vs. Actuals I would ❤ Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!
⚡Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.⚡
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
@Mat42 so the data is always in these block:
For example:
1st block -> user 10 rows
2nd block -> company 3 rows
3rd block -> vehicle 8 rows
so you will take a first come first serve row from company and vehicle and link with the user, correct? so first 3 rows of users will have company data and the rest 7 no company data because there are only 3 company rows and then the first 8 rows of users will have vehicle data and the remaining 2 will not have vehicle data since the vehicle block has only 8 rows. Is this correct understanding?
Now another question, what happens if the company or vehicle rows are more than user rows, does those rows get ignored?
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
I've been trying to reply with further examples, but the system won't let me and keeps crashing out when I try.
The 3 blocks: Name, Department, and Vehicles are limited. There are 30 rows in the Name block, 10 rows in the Department block, and 10 rows in the Vehicles block. These are the maximum that can be entered for each block. I removed the blank rows for each block.
This means that a maximum of 30 names can be entered, 10 departments, and 10 vehicles. The number of each will change on each sheet, but will never exceed the maximum for each block.
In the previous example there were 5 names. This left 25 blank rows in the Names block. There were 3 departments, which left 7 blank rows in the Departments block. There were 2 vehicles, which left 8 blank rows in the Vehicles block.
Does that make sense?
User | Count |
---|---|
92 | |
85 | |
77 | |
66 | |
62 |
User | Count |
---|---|
110 | |
95 | |
95 | |
64 | |
59 |