Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
panchag
Regular Visitor

Power BI data model for Enterprise Reporting with huge volume data from Oracle -Writeup

                   

Who said that Power BI is not a right BI tool for   Enterprise Reporting needs?

Thursday, May 9, 2019

                                   

(PS: Please don’t check for spelling and grammar yet, I will edit this as soon as I find time)

 

Unfortunately, it was ‘me’ who said that initially with my Manager somewhere around the end of year, 2018.   However, during the past 5-6 months, I could see ‘Power BI’ was going through remarkable metamorphosis   and it is almost ready now for Enterprise BI Reporting and being called ‘enterprise BI’ with its new Premium features. I would like to mention that most of its Premium features are necessary for publishing relational data model   (Similar to SSRS Tabular Model) as datasets  in Powerbi.com /Premium Workspace  while we could build them in Power BI desktop Pro and enable the required  features .

 

Background:

­­

I had started exploring this tool ever since it was announced that Power BI would be the recommended enterprise BI development tool at MMS BI in future.

In that process I was quite astonished to find that this tool was already being widely used in our MMS Richmond Office building by some of our Business Users for­­ years. I saw that most of these business users are currently developing their reports and dashboards using native SQL(Oracle)/Power BI   and had already developed some excellent dashboards.

 

Building Power BI reports relying alone on SQL code is not recommended. These reports cannot be re-used whenever slight changes  are required, like  adding new measures/filters   by the end users.

(If these reports/dashboards are not sourced from MMS EDW data warehouse, then I  do not find any fault with them here😊). If they are built out of well-defined, strong Power BI data model, then they can always be edited and Adhoc reporting would be easy and friendly to all end users.

 

I have also seen that other departments (I call them ‘Business’ often as we (IT) work for their reporting needs) have hired new developers who are trying to extract SQL from their earlier SAP BusinessObjects Reports. They are also tailoring the SQL for their dashboard needs in Power BI and have setup their own Premium Workspaces.

 

One of the important BI requirements we (IT-Business Intelligence Team) have is to re-build the exiting OBIEE Adhoc Subject Areas into Power BI datasets (models) so that they could be used for the Adhoc reporting needs.

Currently, we have built these subject Areas (data marts) using ‘Oracle Business Intelligence tool’. All these Adhoc Subject Areas are built out of MMS EDW Oracle data warehouse. I would say that we have moderate usage of these Subject Areas except for ‘Adhoc Ops Activity’ and ‘Adhoc Sales’ which are quite heavily queried.  Their model size has also increased over the period of time as we introduced new Facts/dimensions.

I see that it could not even fit into my external monitor’s screen currently ☹ and I don’t know if it is right to call them a data warehousing dimensional model still?

 

I guess, we must break them into smaller models when we re-build them in Power BI in future.

 

Please find the current usage in the screenshots below which would give an idea.  Images are not clear   here but would showcase these in demo.

 

 

 

 

My initial frustrating experience:

 

I began experimenting with one of the smallest OBIEE Subject Areas called ‘Inventory’. I saw that it was queried only for 1300 times as of today after it was built.

 

At first, I was not able to find suitable drivers to import Oracle tables into Power BI in my laptop.  I have initially googled and installed ODAC components. However, I had to virtually re-format my laptop by removing my Oracle 32-bit Client drivers and Toad versions and install all 64-bit versions.

 

Firstly, it took hours for ‘importing’ each dimension table in Power BI and almost 1-2 days for some medium sized fact tables with over 20-40 million rows. Of course, the screen was frozen, and I was not able to completely download them, So I have purchased some additional RAM from BestBuy and installed it on my laptop, but it did not work out. Our development leader, who handles Virtual Servers, gave access to one of his Windows server for a week’s time and increased its RAM size to 32 GB. I finally was able to import some tables then.  

 

 

Requirement :

 

From the next day onwards, I have started looking for only one single solution: How to handle huge amounts of data and create enterprise BI model from it . 

 

Even today, I did not find a single bit of information anywhere in the Net.  If you don’t believe me, please ‘search’ for yourself. I promise, you would not  find any  information needed to build a ‘Complex data model using  Oracle’ .

 

It was then, when I started reading  Marco Russo  and Alberto Ferrari’s  (SQL Server MVP)  White paper (SQL Server Analysis Services, see the whitepaper that details DirectQuery in SQL Server Analysis Services 2016) , I  have developed some hope on this tool.

 

By using DirectQuery, the database behaves as a semantic model that translates incoming queries to the data source, without keeping a copy of the data in an in-memory database (VertiPaq engine).

 

I started reading related blogs from  various MVP’s and YouTube videos (like Curbal, Guy in the Cube, Avi Sigh, Enterprise DNA, RezaCad, Pragmatic Works, and many more) .

 

I frequently started visiting Edx.org, Courseera.org for Power BI Courses and downloaded enough Amazon books and explored almost all of the Power BI Udemy Courses. I have almost crashed various Search engines 😊

 

I have even installed SQL Server 2016 and Power BI Report Server   on my weak laptop and started exploring various options to have an  ‘On- premises’ experience with Tabular models.

 

I don’t need to do any data transformations for my current requirement, so No ‘Power Query’ or M’ is required here.

I query EDW Oracle data warehouse which is a  very well built relational database model.

 

As I have stated earlier, Power BI started transforming into an  ‘enterprise BI’. I see some new ‘Preview features’ each month which are later made ‘Available’ by the next month.

‘Power BI Desktop/Power BI online’ releases their software updates every month. I also love their  MVP concept (where “technology experts who passionately share their knowledge with the community”).

 

My search has ended as soon as I found the solutions to all of my requirements: such as handling huge amounts of data and  building enterprise BI model .  Very recently, Power BI started concentrating on Powerbi.com (power Bi online) and enhanced its Premium features. I worked with our Microsoft development leaders and got our Premium Workspace- MMS Power BI group just a couple of weeks ago and I named it as MMSBI for now.

 

Here are some of the important features that are used/ would be used in our relational data model which are made from ‘huge Volume Oracle data’. We would need PREMIUM Workspace here, as PRO cannot handle large datasets.

 

 

Feature  1 : Composite models :

 

The first feature that caught my eye was the  Composite mode (mixed mode) https://docs.microsoft.com/en-us/power-bi/desktop-composite-models

 

                            Storage Mode : Dual ,Import  and DirectQuery

 

I have used Composite mode to create this ‘Inventory Subject Area’ model. That means, that I have ‘imported’ small tables like DIM_PERIOD( 6576 Rows)  , DIM_VELOCITY, DIM_BUYER(174 Rows), and used ‘Dual Model’ for medium  sized tables like Dim Distribution Center (2368 rows)  and  ‘Direct Query mode’  for large tables  like Item Dist Center and Inventory Balance . I have used COUNTROWS Dax function to retrieve the numbers of rows found in the below screenshot.

To understand this  feature better, watch any YouTube  Video from Curbal, Radacad or Avi Singh or wait for my demo 😊

 

 

 

 

Feature 2: DAX :

 

We need to use DAX for creating our calculated tables and Measures

DAX stands for Data Analysis Expressions, and it is the formula language used throughout Power BI.

A Simple Example DAX measure that I have used in my model  :

 

Backorder Amount =

SUMX (

    'fInventory Balance',

    'fInventory Balance'[BKORDR_QTY]

        * RELATED ( 'dItem Dist Center'[DSTRBTN_CNTR_ACQ_COST_AMT] )

)

 

 

We could use DAX studio/Dax formatter to document our model: checkout https://youtu.be/UX7WYFX8828

 

I heard that Power BI is coming out with their own built-in DAX studio soon.


I prefer the books, ‘Beginning DAX with Power BI: The SQL Pro’s Guide to Better Business Intelligence’ and ‘Super Charge Power BI: Power BI Is Better When You Learn to Write DAX’, from Amazon to begin the DAX journey.

We may not need ‘Advanced DAX skills’ for our dimensional model here, but  if you want to check it out: -https://www.sqlbi.com/guides/dax/ and  their books too.

 

Feature 3   : Aggregations :

 

We would need to use the recently released  new feature  from Power BI  called Aggregations .

'Christian Wade'  from Microsoft  unlocked petabyte-scale datasets in Azure with aggregations feature  in Power BI   you could check his video https://powerbi.microsoft.com/de-de/blog/aggregations-for-petabyte-scale-bi-available-in-the-power-b... here.

 

If you watch this video from Alberto ( https://www.sqlbi.com/tv/aggregations-in-power-bi/), you would see that he nailed this feature 😊  and said that it’s still immature  by  showing a small  hole in it. Hopefully, it would be covered by Product team soon.

 

‘Aggregations’ can dramatically reduce the cost of unlocking large datasets for decision making.

 

I have not used this feature in my current Power BI Inventory model, as we already  have an ‘Aggregated table’ created  at the backend (SUM_INVTRY_BAL -E1- item quantities at each distribution center, aggregated by item & distribution center individual days) .We would probably be using this feature in all other  Power BI data models later to handle  better query performance  . 

 

Please keep in mind that these aggregated tables are needed be in  ‘imported’ mode while the related ‘Fact’ table is  in ‘Direct Mode’.

 

When the ‘Business Users’ query  any Aggregations in their report, Power BI checks for any defined  ‘Aggregations’ first , and if found, it would return them.

 

It’s important that we must use the  optimized Data refresh option to get the latest data for these Aggregations.

I could also show this feature in my demo.

 

 

 

Feature 4:   : Incremental data refresh policies  :   Incremental refresh in Power BI Premium

 

Incremental refresh enables very large datasets in the Power BI Premium service with the following benefits:

 

  • Refreshes are faster - Only data that has changed needs to be refreshed. For example, refresh only the last five days of a ten-year dataset.
  • Refreshes are more reliable - It's no longer necessary to maintain long-running connections to volatile source systems.
  • Resource consumption is reduced - Less data to refresh reduces overall consumption of memory and other resources.

 

We have to define Parameters (RangeStart and RangeEnd parameters) and define a refresh policy . We may not be enabling this feature for tables that are not imported.

 

 

 

Feature 5 : Dataflows  (is available in Premium Version ) : Self-service data prep in Power BI :

 

This is one of the important features which would decrease the model size by  stripping of the ETL part from the data model/dataset.   Its Power Query Online.

 

Data is stored as entities in the Common Data Model in Azure Data Lake Storage Gen2.   Dataflows are created and managed in app workspaces by using the Power BI service

 

Curbal Video explains the White paper clearly here in this video: https://youtu.be/bkFG8s_9sGE

 

I’m still waiting for this feature in our Premium Workspace ☹  . It is not currently enabled  for us

 

 

 

Here are the high-level steps to create a power BI data Model from Scratch using the above mentioned  new Power BI premium Features

 

Get Data

Create Calculated Tables: https://docs.microsoft.com/en-us/power-bi/desktop-calculated-tables

Created Calculated Columns   : https://docs.microsoft.com/en-us/power-bi/desktop-calculated-columns

Create Relationships

Create Measures: https://docs.microsoft.com/en-us/power-bi/desktop-measures

Create Key Performance Indicators

Create Perspectives

Create Hierarchies

Row-level security (RLS) with Power BI Desktop: Define roles and rules in Power BI Desktop

Publish it to Premium Workspace

 

 

 

1 REPLY 1
v-yuta-msft
Community Support
Community Support

@panchag ,

 

Thanks very much for your sharing. If you have some concerns or issues which is very urgent. I would suggest you to post your issues to the issue forum and create support ticket here.

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors