Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Use integers or not? (conflicting documentation)

Hey guys

 

I am a bit confused whether or not I should use as much integers as possible. The official Microsoft documentation suggests it as a best practice (https://docs.microsoft.com/en-us/power-bi/power-bi-reports-performance), however Chapter 17 in the Defintive Guide to DAX suggests it shouldn't make a difference: "It does not matter whether a column uses a string, as 64-bit integer, or a floating point to represent a value. All these data types can be hash encoded, providing the same performance in terms of speed of scanning and storage space".

 

Am I missing something, or is this conflicting information? And if so, which one would be correct?

 

Thanks in advance.

1 ACCEPTED SOLUTION
Greg_Deckler
Super User
Super User

First, it is tough to argue with the esteemed @marcorusso but generally it takes less space to store an integer than a string in most databases. Now, if you truly have an integer number that you are storing as text, then DAX will need to do a conversion operation if you do something like add it to something else so that could affect performance. 

 

However, since I am a stickler for this kind of thing, I created 2 pbix files with the following formulas:

 

Table = 
VAR __table = GENERATESERIES(1,1000000,1)
RETURN
SELECTCOLUMNS(__table,"Value",CONVERT([Value],STRING))

 

Table = GENERATESERIES(1,1000000,1)

 

And the results, drum roll.....

 

11/06/2019 09:35 PM 11,872,585 text.pbix

 

11/06/2019 09:34 PM 11,503,530 integers.pbix

 

So, yes, text values take up more space than integer values, in this case about 370Kb more.

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

5 REPLIES 5

No, the number is always stored in a compressed way using a HASH or VALUE encoding, depending on which one is chosen by the engine during the process.

 

Anonymous
Not applicable

@Greg_Deckler @marcorusso thank you for clarifying this and taking the effort of running the example.

Greg_Deckler
Super User
Super User

First, it is tough to argue with the esteemed @marcorusso but generally it takes less space to store an integer than a string in most databases. Now, if you truly have an integer number that you are storing as text, then DAX will need to do a conversion operation if you do something like add it to something else so that could affect performance. 

 

However, since I am a stickler for this kind of thing, I created 2 pbix files with the following formulas:

 

Table = 
VAR __table = GENERATESERIES(1,1000000,1)
RETURN
SELECTCOLUMNS(__table,"Value",CONVERT([Value],STRING))

 

Table = GENERATESERIES(1,1000000,1)

 

And the results, drum roll.....

 

11/06/2019 09:35 PM 11,872,585 text.pbix

 

11/06/2019 09:34 PM 11,503,530 integers.pbix

 

So, yes, text values take up more space than integer values, in this case about 370Kb more.

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

How about those store as Whole Number?  According ot Microsoft documentation, it represents a 64 bit (eight-byte) integer value. Since it is stored as 64 bits, it is actually larger right? Thanks.

Greg is right, the number is smaller in dictionary because an integer is 8 bytes and the string is a 16-bit unicode (2 bytes per character), plus the dictionary overhead in any case. My suggestion is to not *add* a surrogate key instead of a string, because you just add another column. If you don't import the application key, then you should evaluate whether the cost to create the surrogate key is worth the effort. Finally, you should also consider other processing time and query time differences rather than just the memory saving. Also the memory saving can depend on the range of numbers involved.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors