Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Nart
Frequent Visitor

Segmenting a query with so many columns (data from a long questionnaire)

I have a 700x800 dataset coming from a long questionnaire (700 questions x 800 submissions).

To work with the data in Power BI, I need to unpivot every few columns, that represent a question, together.
I thought it would be a bad idea to do all that unpivoting in one query, so I started referencing the main cleaned query and using "Choose Columns" to keep only a few related columns to unpivot in a separate query. 

It worked great for a while, but as the number of referencing queries increased, I started noticing some performance issues, which raised several questions in my head:

 

1- Is power BI really copying the whole table and then deleting the columns I exclude in every sub-query? and if so, is there a way to just copy\reference only a selected group of columns? (maybe something like the DAX SelectColumns) without so much waste of processing time.

2- Am I mistaken to think that this segmentation is my only option here?

1 ACCEPTED SOLUTION
CNENFRNL
Community Champion
Community Champion

Hi, @Nart , you're only one inch away from goal. After unpivoting the dataset, extract the serial no of questions. i.e. "Q1" from "Q1-A1", "Q1-A2", "Q1-A3" ... a transformed dataset is like this and it's ready for slice and dice from quite a few dimensions, such as index, gender, age, question, etc.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfIFYkMzIOEM4hmAeFAamW1IBBvBj9WJVjICstxAPAuQ2UYEzMNlrwGGepDZxlCzjYxgZmMzC7vbsPMRZpvAwsQSZLYxHrOwmYnbj7GxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, gender = _t, age = _t, camp = _t, #"Q1-A1" = _t, #"Q1-A2" = _t, #"Q1-A3" = _t, #"Q1-A4" = _t, #"Q1-A5" = _t, #"Q2-A1" = _t, #"Q2-A2" = _t, #"Q2-A3" = _t, #"Q2-A4" = _t, #"Q2-A5" = _t, #"Q2-A6" = _t, #"Q3-A1" = _t, #"Q3-A2" = _t, #"Q3-A3" = _t, #"Q4-A1" = _t, #"Q4-A2" = _t, #"Q4-A3" = _t, #"Q4-A4" = _t]),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Index", "gender", "age", "camp"}, "Opt", "Answer"),
    #"Inserted Text Before Delimiter" = Table.AddColumn(#"Unpivoted Other Columns", "Question", each Text.BeforeDelimiter([Opt], "-"), type text),
    #"Changed Type" = Table.TransformColumnTypes(#"Inserted Text Before Delimiter",{{"Answer", Int64.Type}, {"Index", Int64.Type}})
in
    #"Changed Type"

 

Screenshot 2020-12-08 100259.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

View solution in original post

9 REPLIES 9
Icey
Community Support
Community Support

Hi @Nart ,

 

Please let us know if the replies above are helpful.

 

If they are, please always accept the replies making sense as solution to your question so that people who may have the same question can get the solution directly.

 

If not, please let us know.

 

 

Best Regards,

Icey

Nart
Frequent Visitor

Nevertheless, I'm still interested in knowing whether there's a way to pick a group of columns in a new query without pulling the whole original query first. 

CNENFRNL
Community Champion
Community Champion

Hi, @Nart , sure thing, you might want to use Table.SelectColumns() to extract specific columns from the original dataset.


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

CNENFRNL
Community Champion
Community Champion

Hi, @Nart 

How do you define correlations among questions? I think another table is necessary for the dataset regardless of its form (one-dimenstional table after unpivoting or original two-dimensional table).

 

As you know, DAX runs on VertiPaq Engine, which is in-memory columnar database. Here's a brief introduction for your reference.

https://www.microsoftpressstore.com/articles/article.aspx?p=2449192&seqNum=2

 

To my superfacial understanding, VertiPaq favors one-d table.

 

"...
Single-column access is very fast, because it reads a single block of memory, and then it computes whatever aggregation you need on that memory block.

If an expression uses many columns, the algorithm is more complex because it requires the engine to access different memory areas at different times, keeping track of the progress in some temporary area.

The more columns you need to compute an expression, the harder it becomes to produce a final value, up to a point where it is easier to rebuild the row storage out of the column store to compute the expression.
..."
(qutotation from afore-mentioned article)

 

 


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Nart
Frequent Visitor

@CNENFRNL Thanks a lot! that was right to my point.

I guess I'll just create a long table and use a duplicate with questions that require cross-question filtering.

I'll let you know how that works.

Nart
Frequent Visitor

I am using the reference feature as I said, not duplicating. But still, the referencing step seems to pull the entire table into the new query while I just need a few columns. 

 

The reason I went with segmenting is that I need to unpivot groups of columns together. so If I do this while all columns are in the same query, I'd be creating thousands of new rows every time. 

Here's how my dataset looks like. 

questionnaire data sample.png

Each question is represented across several columns.
To viaualize and analyze the answers, I'm unpivoting the columns for each question separately to get all its answers in one column.

CNENFRNL
Community Champion
Community Champion

Hi, @Nart , you're only one inch away from goal. After unpivoting the dataset, extract the serial no of questions. i.e. "Q1" from "Q1-A1", "Q1-A2", "Q1-A3" ... a transformed dataset is like this and it's ready for slice and dice from quite a few dimensions, such as index, gender, age, question, etc.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfIFYkMzIOEM4hmAeFAamW1IBBvBj9WJVjICstxAPAuQ2UYEzMNlrwGGepDZxlCzjYxgZmMzC7vbsPMRZpvAwsQSZLYxHrOwmYnbj7GxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Index = _t, gender = _t, age = _t, camp = _t, #"Q1-A1" = _t, #"Q1-A2" = _t, #"Q1-A3" = _t, #"Q1-A4" = _t, #"Q1-A5" = _t, #"Q2-A1" = _t, #"Q2-A2" = _t, #"Q2-A3" = _t, #"Q2-A4" = _t, #"Q2-A5" = _t, #"Q2-A6" = _t, #"Q3-A1" = _t, #"Q3-A2" = _t, #"Q3-A3" = _t, #"Q4-A1" = _t, #"Q4-A2" = _t, #"Q4-A3" = _t, #"Q4-A4" = _t]),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"Index", "gender", "age", "camp"}, "Opt", "Answer"),
    #"Inserted Text Before Delimiter" = Table.AddColumn(#"Unpivoted Other Columns", "Question", each Text.BeforeDelimiter([Opt], "-"), type text),
    #"Changed Type" = Table.TransformColumnTypes(#"Inserted Text Before Delimiter",{{"Answer", Int64.Type}, {"Index", Int64.Type}})
in
    #"Changed Type"

 

Screenshot 2020-12-08 100259.png


Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension!

DAX is simple, but NOT EASY!

Nart
Frequent Visitor

Thanks @CNENFRNL , I did try that earlier, but what I've lost is the ability to filter questions by other questions to see how they correlate. Plus, wouldn't the whole (very long) database have to be loaded and filtered in every visual in that case? wouldn't that cause report pages and interactive filters to load much more slowly?

Do you think it would be a reasonable performance tradeoff?

mahoneypat
Employee
Employee

Power Query can handle much more than 700 columns.  Why did you initially decide to segment?  In any case, are you using the Reference feature between your queries?

Reference vs Duplicate in Power BI; Power Query Back to Basics - RADACAD

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors