cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Atanas
Helper IV
Helper IV

Making Data Model From Flat File

Dear Community,

 

I have 1 flat CSV file which I'd like to break down by it's dimensions and create star schema in my data model.

I loaded the file in power query and with the "Reference" function i created the Dimension tables as shown below:

Screenshot 2.PNG

NOTE: Each one of the dimension tables was created as it follows: right click on the fact table-> reference->remove other columns->remove duplicates.

My data model looks fine and exactly how i imagined it:

Screenshot 3.PNG

The only problem i got is loading. The original (fact table) CSV is 46MB file... When i load it with the dimensions it loads up 6 tables by 46MB (the calendar one was created from blank query).Please see here:

Screenshot 1.PNG

Can anyone help me understand why this is happening? For example in the category table i have 15 rows, but it still loads 46MB. I've never experienced something like that so far (maybe because i was using smaller files).But just to create a simple measure now it takes more than 30 seconds.

 

All due respect,

 

Atanas

 

2 ACCEPTED SOLUTIONS
v-kelly-msft
Community Support
Community Support

Hi  @Atanas,

 

How about creating the dim tables with dax formular rather than in power query?

For example,my oringinal table is as below from an Excel file:

v-kelly-msft_1-1623314068549.png

To achieve what you need:

Create dim tables as below:

Table = VALUES(Sheet2[category])

Then create relationships as you need:

v-kelly-msft_2-1623314134791.png

Pls note that the direction should be from fact table side to dim table side.

Then you would see when you refresh the table,it will only load one time:

 

v-kelly-msft_0-1623313976531.png

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

 

 

View solution in original post

edhans
Super User III
Super User III

@Atanas It is not loading 46mb. It is processing the 46mb file to load the data. The actual 15 record table is probably a few KB at most. You could validate this using DAX Studio and the Vertipaq Analyzer feature which will tell you the size of each loaded table. 

this is an issue working with flat tables. With no database on the back end power query has to fully process the entire file even if you have filtered it to just one record. 

But it is best practice to do this in PQ. Consider if your csv file has the customer number and name in it.


You could create a DIM table of that info, but by using DAX, you cannot remove that redundant customer name info from the FACT table. You can in power query. Smaller model over all.  



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

8 REPLIES 8
Atanas
Helper IV
Helper IV

@v-kelly-msft  and @edhans  - thank you both for your information.

 

I will try it with Values. That will work perfectly. This is also what Aaron from Maven Analytics recommends in his Advanced DAX course. Using Values is easy and cool way to do it.

 

@edhans  - will try to get rid of my current labelling of the tables 🙂 I learned to arrange my tables like so from Chris Dutton of Maven Analytics.

 

Thank you one more time and wish you all the best,

 

Atanas

edhans
Super User III
Super User III

I'm glad you got an answer you can work with @Atanas but long term I would advise against working in DAX to shape and model your data. For the very simple example you have shown, VALUES() works, but when the data is more complex - for example, you have additional fields beyond the key field in the flat table, Power Query is the better practice as it can ensure uniqueness of the first key, then add the relevant fields (think Customer Key is unique, and customer name may be, but as you add more fields like state or zip code, those will not be unique), and the DAX gets a bit crazy. From the DAX master himself

The earlier you fix your data, the better. Delaying data transformation along data supply chain creates a technical debt, which you pay only on data you actually consume.

Additionally, the DAX method will not allow you to remove the redundant data from the flat table, whereas Power Query will if you use the approach I listed above - summarized:

  1. Connect to the flat table
  2. Disable load on it
  3. Create as many references as necessary for all subsequent DIM and FACT tables, with each table only keeping the necessary columns to create a Star Schema.


Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans- Thank you very much. I will consider and experiment with your suggestion.

 

I understand VALUES is not the best option, at first i was just surprised by the total MB of the data model. Usually I am not Power BI user, but Power Pivot. And when i create similar dim tables from a flat file Excel loads only the rows I told Power Query to filter. So the cardinality of the tables is low and performance is ok. I was truly shocked to see table with 15 rows in it to be 46MB.

edhans
Super User III
Super User III

@Atanas It is not loading 46mb. It is processing the 46mb file to load the data. The actual 15 record table is probably a few KB at most. You could validate this using DAX Studio and the Vertipaq Analyzer feature which will tell you the size of each loaded table. 

this is an issue working with flat tables. With no database on the back end power query has to fully process the entire file even if you have filtered it to just one record. 

But it is best practice to do this in PQ. Consider if your csv file has the customer number and name in it.


You could create a DIM table of that info, but by using DAX, you cannot remove that redundant customer name info from the FACT table. You can in power query. Smaller model over all.  



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

@edhans 

 

That's the exmplanation I needed.

 

Thank you very much!

 

All due respect,

 

Atanas

edhans
Super User III
Super User III

Great @Atanas - glad we finally got to your question. 😁



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
v-kelly-msft
Community Support
Community Support

Hi  @Atanas,

 

How about creating the dim tables with dax formular rather than in power query?

For example,my oringinal table is as below from an Excel file:

v-kelly-msft_1-1623314068549.png

To achieve what you need:

Create dim tables as below:

Table = VALUES(Sheet2[category])

Then create relationships as you need:

v-kelly-msft_2-1623314134791.png

Pls note that the direction should be from fact table side to dim table side.

Then you would see when you refresh the table,it will only load one time:

 

v-kelly-msft_0-1623313976531.png

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

 

 

View solution in original post

edhans
Super User III
Super User III

That is how it works. Power Query doesn't start at the first query and work down, it starts at the bottom (last) query and works backwards, so 6 tables from 1 will cause it to process that first source table 6 times.

 

One thing you might want to try that may help is all of your DIM tables (except calendar) are based on the fact table, and the fact table is itself loading.

 

Instead, mark the fact table to "disable load" and create a reference from the original FACT table to a new one that is just a pointer. Probably no further transformations. If you go to Query Dependency View, you'd see your Load Disabled FACT table at the top, and all DIMS and new FACT table at the bottom. 

 

And someting to consider: Get rid of DIM and FACT in your table names. Will not help performance one bit, but it is best practice to use table and field names in the language of your audience with no tech speak or CamelCase. If two words, use a space. "Sales Data" - not SalesData, or SalesFACT or any other nerdy stuff. 😁

I love you have your DIMs at the top and FACT on the bottom. Learned from Rob Collie or Matt Allington? Filters flow downhill! 👍



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
PBI_User Group Leader_768x460.jpg

Manage your user group events

Check out the News & Announcements to learn more.

Get Ready for Power BI Dev Camp

Microsoft named a Leader in The Forrester Wave

Microsoft received the highest score of any vendor in both the strategy and current offering categories.

Get Ready for Power BI Dev Camp

Power BI Dev Camp - September 30th, 2021

Mark your calendars and join us for our next Power BI Dev Camp!

PowerPlatform 768x460.png

Microsoft Learn

Check out our new Discover Your Career Path blog post series and get all the details.

Top Kudoed Authors