Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
shafeequem
Regular Visitor

Compare the column from different sources

Hi Guys,

We have data from a different source

HR --> SQL DB

AD --> CSV export
Azure AD --> CSV export

 

We need to compare the attributes from each data source to identify the differences.

Currently, we are comparing the data using excel. However, it is a very time-consuming task as we need to run vlookup against each column and run some excel function to identify the difference.

 

I have completed PowerBI basic training and I am able to import the data from different sources. However, I am unable to identify a way to get the difference in the data.

 

For Example;
if you look at below example, WORK_PHONE and telephonenumber are different for some users (Same for the Email address). We need a way to identify a difference in each attribute without doing manual work every time. I would appreciate if someone could share the sample reports.

Also please let me know if there is any Article or video I can refer to learn this use case ( I understand that there are plenty of videos available on youtube and Udemy. But I am unable to find video related to my use case). Thank you very much in advance.

HR

EmpNumber

EMAIL_ADDR

DISPLAY_NAME

ACCOUNT_TYPE

WORK_PHONE

123456

firstname.lastname@domain.com

FirstName LastName

1

1111111

123457

firstname1.lastname1@domain.com

FirstName1 LastName1

2

2222222


AD

EmpNumber

mail

displayname

accounttype

telephonenumber

123456

firstname.lastname@domain.com

FirstName LastName

1

1111111

123457

WrongEmail@domain.com

FirstName1 LastName1

2

3333333


AzureAD

EmpNumber

mail

displayname

accounttype

telephonenumber

123456

firstname.lastname@domain.com

FirstName LastName

1

1111111

123457

firstname1.lastname1@domain.com

FirstName1 LastName1

2

3333333

 

 

3 REPLIES 3
Jimmy801
Community Champion
Community Champion

Hello @shafeequem 

 

a general recomandation from my side. If you need to analyse a lot of data afterwards (not visuals), Excel is the tool to be used. And as Excel has also Power Query integrated (even an older version) you should easily be able to complete your task within Excel-PowerQuery.

A solution to your problem at high level could be like that

- create 3 different queries for your 3 data sources and bring them in a comparable format (think already here about your process -meaning, that choose a public folder and name your export files always the same)

- create a new query where you join all three data sources using the EmpNumber as key

- Expand every table and add new columns for every check you want to make (A good idea here is to use the prefix-function of the expandtablecolumn in order you can clearly identify from which data source your data is comming)

 

For sure there are more satisfactory solution possibile, but I think this is a very good starting point and its better to keep it easy, in case of possible changes.


If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

 

 

 

 

 

The data size is pretty huge. and we need to visualize the data for management. So we prefer PowerBI. Could someone please help me to get a good article or video for below steps

1. Join all three data sources 

2. prefix-function of the expandtablecolumn

 

Anonymous
Not applicable

Hi @shafeequem ,

 

Thank you for your quesiton, it was an intersting problem to work on :).

I think this is what you are trying to achive - sorry, this is quite a bit of coding, but, in theory, all you need is to change is the "external parameters".

 

mSource is the list that keeps references to your tables (i.e. type table), not names of the tables (type text).

mNumberOfColumnInIndex is the number of colums being evaluated. IMPORTANT! The code assumes that the fields/columns come in the same order in all columns! If not, please re-arrage it within the cource code of relevant tables. The total number of columes in each table does not matter unless it is more than or equal to mNumberOfColumnInIndex.

 

Everything else is calcualted automatically.

fBuildIndex adds a merge colums that combines values from the first mNumberOfColumnInIndex columns to serve as merge-on field.

fFilterNotMatchingRows takes a table as a parameter (well technically a table index in the mSourceList) and iteratively goes pairs it with all tables in the mSource list to find descrepancies. This can be improved as there is no reason to run this type of merge on self, but I have not coded this exception. If the overall performance is terrible, this is one of the first things to look at.

 

let
    //External parameters
    mSource = {HR, AD, #"Azure AD"},
    mNumberOfColumnInIndex = 5,

    //Calculated parameters
    mTableCount = List.Count(mSource),
    
    //Index Combine Funciton
    fBuildIndex = 
        (pTable)=>Table.AddColumn(pTable, "!MergeOn!", each Text.Combine(List.Accumulate(List.FirstN(Record.ToList(_),mNumberOfColumnInIndex), {}, (s, a)=> s & {Text.From(a)}), "<=&=>"), type text),

    PrepareTables = List.Accumulate(mSource, {}, (s,a)=> s & {fBuildIndex(a)}),

    fFilterNotMatchingRows = 
        (pIndex as number) => 
            List.Accumulate({0..mTableCount-1}, Table.FromList({}), 
                (s,a)=> 
                    let 
                        MergeAnti = Table.NestedJoin(PrepareTables{pIndex}, {"!MergeOn!"}, PrepareTables{a}, {"!MergeOn!"}, "!MergeFlag!", JoinKind.LeftAnti),
                        RemoveFields = Table.RemoveColumns(MergeAnti, {"!MergeOn!", "!MergeFlag!"}),
                        AddTableIndex = Table.AddColumn(RemoveFields, "!Difference!", each "Between table " & Number.ToText(pIndex) & " and table " & Number.ToText(a), type text),
                        AccumulatedTable = Table.Combine({s, AddTableIndex})
                    in AccumulatedTable
            ),
    
    ProcessTables = List.Accumulate({0..mTableCount-1}, Table.FromList({}), (s,a) => Table.Combine({s, fFilterNotMatchingRows(a)}))
   
in
    ProcessTables

 

 

In this version it points to the table indexes that the difference relate to, it is possible to add functionality to point to an exect row number in the main table (on the left side of the LeftAnti merge). I did not implement it, but this is reasonbly simple addition.

 

Kind regards,

JB

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors