Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

data cleansing

I wanted to ask what the best way for data cleansing would be in relation to some sale figures and dates.I got an excel files that requires some formatting but I am unsure about the way to proceed to clean the data through the software itself and make it efficient for my data analysis.

2 ACCEPTED SOLUTIONS
Anonymous
Not applicable

The first thing to understand is that how data is stored and how data is displayed on screen is not the same.  Data cleansing is all about how the data is stored.  When its nicely stored its much easier to have it display the way you want.

 

The best kind of data is any data that is stored numerically (Whole Number, Decimal Number, Date, DateTime etc)

The worst kind of data is text based data, although it is a necessary evil.

 

When you bring data in from Excel, you want to cleanse and convert the data as best as possible into their proper datatypes.  Doing this will ensure you can do everything you need once you have it inside the data Model itself.

 

This will require you to try to anticipate things that could turn up within the data from users.  If your Excel file is being generated from a system, you can be reasonbly confident that everything will stay the same.  If your Excel file is being populated by hand, you can expect that data cleansing will be an on-going maintenance item that you will need to address  from time to time whenever someone makes a keying error.

 

Common cleansing items are:

Converting Text based data into numerical types - This can take 1 or more steps depending on how good the data is.

Aligning Text values to look nice - This could be removing common spelling errors using 'Replace Values' to chopping the data up to remove uncessary prefixs and suffixes.

Removing bad characters - This can be done from a Trim and/or Clean operation, or might require a Replace Values to remove.

View solution in original post

v-juanli-msft
Community Support
Community Support

Hi @Anonymous

Thanks for Ross73312's so detailed introduction, i would add something which you could learn how to clear with Excel or Power BI.

Reference:

Data Cleaning in Excel or in Power BI

 

Best Regards

Maggie

View solution in original post

2 REPLIES 2
v-juanli-msft
Community Support
Community Support

Hi @Anonymous

Thanks for Ross73312's so detailed introduction, i would add something which you could learn how to clear with Excel or Power BI.

Reference:

Data Cleaning in Excel or in Power BI

 

Best Regards

Maggie

Anonymous
Not applicable

The first thing to understand is that how data is stored and how data is displayed on screen is not the same.  Data cleansing is all about how the data is stored.  When its nicely stored its much easier to have it display the way you want.

 

The best kind of data is any data that is stored numerically (Whole Number, Decimal Number, Date, DateTime etc)

The worst kind of data is text based data, although it is a necessary evil.

 

When you bring data in from Excel, you want to cleanse and convert the data as best as possible into their proper datatypes.  Doing this will ensure you can do everything you need once you have it inside the data Model itself.

 

This will require you to try to anticipate things that could turn up within the data from users.  If your Excel file is being generated from a system, you can be reasonbly confident that everything will stay the same.  If your Excel file is being populated by hand, you can expect that data cleansing will be an on-going maintenance item that you will need to address  from time to time whenever someone makes a keying error.

 

Common cleansing items are:

Converting Text based data into numerical types - This can take 1 or more steps depending on how good the data is.

Aligning Text values to look nice - This could be removing common spelling errors using 'Replace Values' to chopping the data up to remove uncessary prefixs and suffixes.

Removing bad characters - This can be done from a Trim and/or Clean operation, or might require a Replace Values to remove.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors