For a beginner in Power BI and in business intelligence in general, it is not easy to understand data types and differentiate them from data format. And then comes data category... So, what's all that about, really?
It's true that in spreadsheet applications like Excel, data type is not really of a bother to the user, as there is no functionality to define the type of the data present in cells. We just enter values and it's all fine (somehow). In fact, I only started to bother about data types when starting to use Power Query, Power BI and Power Pivot (data modelling tool in Excel).
Power Query is less friendly with typeless (or rather untyped) columns. Try creating a dataflow in the Power BI Service without specifying a data type for each column, and you will get the following message.
Not cool, right? What's wrong with my data? It all looks good to me after all.
So, the Power Query team tried to make it easy by automatically detecting column types from the imported data.
Though, sometimes it is more disturbing than anything else and you end up deleting that step Power Query creates for you. After all, it's my data and I'll give it the type I want.
Back in Excel you can only disguise your data, playing with formatting options. Formatting is just a way of altering your data appearance, without changing anything of its value.
How does Excel represent data types?
In Excel, data are naturally aligned based on their type. It gives a visual clue of the data type we are dealing with.
- Left aligned => text - Right aligned => numbers and date and time - Center aligned => Boolean (logical, true/false)
NOTE: It is a good practice toNOTalter the natural alignment of data in Excel.
Notice, here I have altered the format of numbers, forcing them into text format, and thus naturally aligning to the left. Though I haven't altered its type. Beneath the disguise, they remain numbers, as we can see at the bottom that aggregation like SUM() and AVERAGE() are well performed.
Again, in Excel, data type is not well emphasized and enforced.
So, ok, in Excel we didn't really care about types, and Power Query takes care about it (in its own way), so, really, why bother? Well, because you want control over your data, yep, total control. So, you get to know about data types. And because you don't want to end up with errors like this in your data set. Your analysis would be affected.
And just in case you try to load your data into the data model of Excel...
And, actually, when looking at the data model in Power Pivot, well Excel (or Power Pivot, who knows...) got rid of your errors.
Furthermore, data type errors are common, and you will often find an error message similar to this one
In Excel, adding a number to a date would have worked without any problem. So, when used to Excel and its way of handling data, you don't really care about types, but with Power Query, you have to very much care about what type it is.
So then what are data types?
There are several types of data, but let's say the most common ones are:
Text (or sometimes referred to as string)
Number (numeric values)
Date and time
Boolean, or logical (also referred to as true/false types)
Data type is very important, and most often you are required to declare the data type you are working with.
Text data type is very simple and straight forward. It's just plain text. Nothing much to say.
With number data types, it becomes more interesting because with numbers you can use it for aggregation and extract meaningful information. You can sum it, add or subtract a value, calculate averages, round it up or down, and many more things can be done.
With date and time data type, it's a wealth of information, because you can extract the time (hours and minutes), the date alone, the month, the year, the quarter, the day, the week number, the day of the week... And, whether in Excel, in Power BI or in any other tool you always find date and time specific functions (MONTH(), YEAR(), WEEKDAY(), WEEKNUM()...)
You can add or subtract days, months, years, hours to a date. In fact, a date is just a number disguised into a date and time. If you change the format of a date in Excel, you will get a bunch of meaningless digits. The numbers to the right of the decimal represent the time. The numbers to the left of the decimal represent the number of days starting from midnight of the 1 January of 1900. And today is the 43,969 after the start of the 1st day of the first month of 1900.
A lot of information can be extracted from a date and time data type.
NOTE: Dates can be formatted differently depending on your regional settings, American standards are different from European standards for instance. => And this is often source of errors and maybe source of misinterpretation.
For Boolean values, well, those are pretty simple, it's either TRUE, either FALSE. 👌🏼
Be my type!
In, Power Query, one of the very first functions (and step) you see appearing in the formula bar isTable.TransformColumnTypes()(though you never really write it manually)
And, whether in Excel, or in Power BI (Power Query, Power Pivot), it is a good practice toNOTmix data types. Don't even dare messing up with types! Kind warning 😋
For that, Power Query offers you a variety of data types to choose from.
Notice that here we have more than the 4 different data types we discussed earlier.
Numbers are sub-divided into:
Currency (Fixed decimal number in Power Query for Power BI => Yep! Go wonder why 😕)
Date and Time format is also sub-divided into:
We can also notice a new data type => "Binary" -> 😳 What on earth is that -> Let's keep that for another time maybe. 😋
And, alsoUsing Locale, as if we didn't have enough data types, right? We can review that one when we discuss date and time data types
What about data category?
Wait! What? Data what again? You're kidding right?
Nope, not kidding. It's a real thing 😁
The list of categories available actually depends on the column data type selected.
Alright, cool! That's in Power BI. What about Power Pivot, or Excel? Power Pivot doesn't offer the option to categorize data, but Excel... hmmmm... does try something similar, though it is just bringing more to the confusion.
Yep, that's right, the guys of the Excel team decided to name that group "Data Types" => Really confusing, right? Anyway, the majority of users don't even know those exist 😝
That is it for now. That's enough for a good start. Hope you find it useful. Let me know if you have any questions, comments, or suggestions.