Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
RafaelKnuth
Advocate I
Advocate I

Challenge: Transforming Data with M-Language - messy data scraped from the web

I am new to M-Language, and I am playing around with it to see how far I can get.

 

I pulled data from the web: https://www.t-systems.com/de/en/locations for the purpose of creating a clean directory of company locations.

 

After several failed attempts to get data directly from the web into Excel / Power BI, I tried several web scrapers, of which only one worked halfways well.

 

Finally, I got an Excel file that looks like this:

I am trying to wrap my head around this data, and I was wondering if it's even possible to clean that data up with help of M?

 

It's barely digestable for a human ...

 

Obviously, it would take me less time to just manually copy and paste the data into a text file, clean it up manually and then load it into Excel. But I was wondering how a data pro would handle such a scenario, given the data set wouldn't be just a couple dozen addresses but a huge volume, impossible to get cleaned up manually.

 

Please bear in mind that the example above is totally arbitrary, based on publically availeable data, for educational purposes only.

I am using it to demostrate a realistic scenario.

 

Thanks for your feedback!

2 REPLIES 2
v-shex-msft
Community Support
Community Support

HI @RafaelKnuth,

 

Current power query not contain functions to auto analysis records.

 

According to your screenshots, I think you can use try to get data from specific api, then remove 't-system' prefix from address column.

 

After these steps, I think remain part text can be analysed as 'data category' address/place. You can use these records to create map.

 

Reference link:

Using a REST API as a data source

5 Very Useful Text Formulas – Power Query Edition

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Thank you so much @v-shex-msft are you aware of any tools that perform auto analysis records?

That data set is unfortunately not available through REST APIs, I had to scrape it from the site which is I guess why it's so messy.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.