Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
wi11iamr
Advocate II
Advocate II

Cartesian Join of two queries to create new table with a record for each CustomerID, for each month

Hi there,

 

I have a Customer query which has only one column, a unique list of all CustomerID's.

I then have a Calendar query which also has only one column, the first day of each month, as follows:

 

Capture.PNG

 

I wish to create a new query combining the CustomerID and the Date such that I have a record for each CustomerID, for each Date. Something like this:

Date                       CustomerID

1900/01/01            12345

1900/02/01            12345

1900/03/01            12345

1900/04/01            12345

1900/05/01            12345

1900/06/01            12345

 

1900/01/01            67890

1900/02/01            67890

1900/03/01            67890

1900/04/01            67890

1900/05/01            67890

1900/06/01            67890

 

As far as I can tell, there is no "cartesian join" functionality within PowerBI, but I believe this is in effect what I am trying to do.

 

Any ideas...?

1 ACCEPTED SOLUTION
MarcelBeug
Community Champion
Community Champion

Just add a column to the Customer table with Calendar for each Customer:

 

let
    Source = Excel.CurrentWorkbook(){[Name="Customer"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Calendar),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Date"}, {"Date"})
in
    #"Expanded Custom"

 

Be careful with dates prior to March 1, 1900 as these are differently numbered between Excel and Power Query / Power BI.

Excel thinks 1900 is a leap year, but Power Query / Power BI know better!

Specializing in Power Query Formula Language (M)

View solution in original post

11 REPLIES 11
MarcelBeug
Community Champion
Community Champion

Just add a column to the Customer table with Calendar for each Customer:

 

let
    Source = Excel.CurrentWorkbook(){[Name="Customer"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Customer", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Calendar),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Date"}, {"Date"})
in
    #"Expanded Custom"

 

Be careful with dates prior to March 1, 1900 as these are differently numbered between Excel and Power Query / Power BI.

Excel thinks 1900 is a leap year, but Power Query / Power BI know better!

Specializing in Power Query Formula Language (M)
Anonymous
Not applicable

What about a cartesian join for non date values?

 

I have similar query with list of unique ID's and a seperate query with list of demographic values. I need to create a singular query where each demographic value has a corresponding unique ID. 

 

From two separate queries:

 

Unique ID.png      and       Demospng.png

 

Into something like this:

Cartesian Join.png

 

The standard Append function has not achieved the desired results. Any help would be appreciated!

Hi @Anonymous,

 

This M language code works

 

let
    Source = Excel.CurrentWorkbook(){[Name="Data"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Unique ID", Int64.Type}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each colour[Text]),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Custom",{{"Custom", "Colour"}})
in
    #"Renamed Columns"

See the result below

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Anonymous
Not applicable

Hi Ashish, thank you for your response. I am fairly new to the M code format so bear with me. I attempted to edit the original code from the data file load and and added some of the script you shared in Power BI. It ultimately left me with the following:

 

let
Source = Excel.Workbook(File.Contents), null, true),
#"Unique IDs_Sheet" = Source{[Item="Unique IDs",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(#"Unique IDs_Sheet", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Unique ID", Int64.Type}, {"Demographics", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Table.AddColumn(#"Changed Type", "Custom", each [Demographics])),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
#"Renamed Columns" = Table.RenameColumns(#"Expanded Custom",{{"Custom", "Demographics 2"}})
in
#"Renamed Columns"

 

I was able to get through the #Added Custom command without issue but as soon as I started both #Expanded Custom and #Renamed Columns commands the applied query ran for hours upon hours while processing multiple tens of GB's of data before I finally gave up. My original data file is over 5k MB but I expected it to be done a lot sooner. I ran a similar cartesian join in Alteryx and it finished the join in about 4 seconds. Is there an issue with my code?

 

Thanks,

Gabe

You have 2 Table.AddColumn commands in your #"Added Custom" step, which will result in a column with a nested table, which is in turn a table with a nested table, in other words: a lot of data.

 

Verify and adjust your code according to the proposed solution.

Specializing in Power Query Formula Language (M)
Anonymous
Not applicable

Thanks for catching that, adjusted code below. Expected it to run without issue but...

 

let
Source = Excel.Workbook(File.Contents(data file), null, true),
#"Unique IDs_Sheet" = Source{[Item="Unique IDs",Kind="Sheet"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(#"Unique IDs_Sheet", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Unique ID", Int64.Type}, {"Demographics", type       text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each [Demographics]),
#"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
#"Renamed Columns" = Table.RenameColumns(#"Expanded Custom",{{"Custom", "Demographics 2"}})
in
#"Renamed Columns"

 

I now get the error message:

Error Message.png In Advanced Editor it says No Syntax Errors have been Detected but won't allow me to apply the query. 

 

In the step #"Added Custom", each [Demographics] is incorrect. It must be in the format table[field], so the table part is missing. This should be either the name of another query, or the name of another step in this query. 

Specializing in Power Query Formula Language (M)
Anonymous
Not applicable

@MarcelBeug i realize this is an old post, and there may be a better way to do this at this point, but is there a way to filter the join criteria based on two other fields in the table?  So for the below code, in the "#"Added Custom" I would only want the distinct DateMonths wich are larger than a start date and smaller than an end date in the OpportunityLineItem table.

let
Source = Salesforce.Data(),
OpportunityLineItem = Source{[Name="OpportunityLineItem"]}[Data],
#"Filtered Rows1" = Table.SelectRows(OpportunityLineItem, each [Start_Date__c] >= #date(2016, 1, 1)),
#"Added Custom" = Table.AddColumn(#"Filtered Rows1", "DateMonth", each DistinctMonths[DateMonth])
in
#"Added Custom"

 

I tried this 

Table.AddColumn(#"Filtered Rows1", "DateMonth", each DistinctMonths[DateMonth] >= Date.StartOfMonth([Start_Date__c]) DistinctMonths[DateMonth] < Date.StartOfMonth([End_Date__c]))

and i get the follwoing error:  

Expression.Error: We cannot apply operator >= to types List and Date.
Details:
    Operator=>=
    Left=List
    Right=10/1/2017

 

(the reference table DistinctMonths has a single Date formated column with months in a given date range set as the first of each month)

Anonymous
Not applicable

@MarcelBeug i realize this is an old post, and there may be a better way to do this at this point, but is there a way to filter the join criteria based on two other fields in the table?  So for the below code, in the "#"Added Custom" I would only want the distinct DateMonths wich are larger than a start date and smaller than an end date in the OpportunityLineItem table.

let
Source = Salesforce.Data(),
OpportunityLineItem = Source{[Name="OpportunityLineItem"]}[Data],
#"Filtered Rows1" = Table.SelectRows(OpportunityLineItem, each [Start_Date__c] >= #date(2016, 1, 1)),
#"Added Custom" = Table.AddColumn(#"Filtered Rows1", "DateMonth", each DistinctMonths[DateMonth])
in
#"Added Custom"

 

I tried this 

Table.AddColumn(#"Filtered Rows1", "DateMonth", each DistinctMonths[DateMonth] >= Date.StartOfMonth([Start_Date__c]) DistinctMonths[DateMonth] < Date.StartOfMonth([End_Date__c]))

and i get the follwoing error:  

Expression.Error: We cannot apply operator >= to types List and Date.
Details:
    Operator=>=
    Left=List
    Right=10/1/2017

 

(the reference table DistinctMonths has a single Date formated column with months in a given date range set as the first of each month)

Basically the solution in this topic has nothing to do with date values, so you can apply it likewise to your situation.

Specializing in Power Query Formula Language (M)

Thanks @MarcelBeug, it worked a charm!!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.