Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Optimizing Large Dataset with High Cardinality Fields

All,

 

I've been working on building a shared dataset to allow for our business to create a master/detail type of dashboard.  The dataset has become fairly massive as they've expanded the scope to include high cardinality text fields, which has pushed me beyond the 1GB.  I'm trying to find a means to optimize and shrink this down, but it is challenging to say the least.  The first concept I looked into was a Composite Model, but seems like I would have a row limitation of 1 million records, so couldn't go down that path.  Next, I tried to remodel my dataset to adhere to a Star Schema, but the storage increased (I thought reference tables wouldn't take up added space, but that doesn't seem to be the case).  Finally, I did some analysis using the Vertipaq Analyzer, which lead me to trying to find the best means to deal with these high cardinality fields...

Not a lot I could do with the free-form text fields, but there is a M query column that I created using the concatenation of a lengthy URL string with a numeric ID value.  I tried to convert this over to DAX instead, but there didn't seem to be any benefit there, so I'm wondering if there is a better means to do this - perhaps some other variable by splitting off the based URL?   Alternatively, I was thinking of just dropping this field altogether and moving it to the business dashboard, but then we'd have to do that for every visualization that wants that URL, which kind of defeats the purpose of a shared dataset. 

Short of going to Premium, I don't know what other options there are.  So looking for any advise from all of you 🙂

 

Thanks in advance! 

2 REPLIES 2
v-diye-msft
Community Support
Community Support

Hi ,

 

Sorry not very clear about your requirement, do you have some sample data or preferably a sample pbix file?

 

From the subject about the question, premium could be the best solution i think. 

Community Support Team _ Dina Ye
If this post helps, then please consider Accept it as the solution to help the other members find it more
quickly.
Anonymous
Not applicable

For the URL example, say I have a parameterized URL string with the common portion of

http://www.test.com/blah1/blah2/blah3&p1=

So my table is as follows with the ID column and URL being the concatenation of the aforementioned URL string and ID together

ID  URL

1    http://www.test.com/blah1/blah2/blah3&p1=1

2    http://www.test.com/blah1/blah2/blah3&p1=2

3    http://www.test.com/blah1/blah2/blah3&p1=3

4    http://www.test.com/blah1/blah2/blah3&p1=4

...

Hope that helps to clarify

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.