Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Greg_Deckler

Good Ol' VLOOKUP - The Ultimate Guide to Lookups in Power BI

Introduction

Let's face it, our data is not always as clean as we would like it to be. There are many cases where we would like to transform our data into something more friendly to end users or standardize known data issues. For example a customer might be referenced in our data as:

  • Customer Name
  • Customer Name, Inc
  • Cust

Something like a VLOOKUP function would come in handy to transform such data. This article provides several different methods of recreating VLOOKUP functionality with Power BI. 

 

Solution Setup

To demonstrate lookups in Power BI, we will use US state names and abbreviations. To follow along with the example, create two comma-separated value (CSV) files such as:

StatePopulations.csv

Rank,State Name,Population
1,California,"37,253,956"
2,Texas,"25,145,561"
3,New York,"19,378,102"
4,Florida,"18,801,310"
5,Illinois,"12,830,632"
6,Pennsylvania,"12,702,379"
7,Ohio,"11,536,504"
8,Michigan,"9,883,640"
9,Georgia,"9,687,653"
10,North Carolina,"9,535,483"
11,New Jersey,"8,791,894"
12,Virginia,"8,001,024"
13,Washington,"6,724,540"
14,Massachusetts,"6,547,629"
15,Indiana,"6,483,802"
16,Arizona,"6,392,017"
17,Tennessee,"6,346,105"
18,Missouri,"5,988,927"
19,Maryland,"5,773,552"
20,Wisconsin,"5,686,986"
21,Minnesota,"5,303,925"
22,Colorado,"5,029,196"
23,Alabama,"4,779,736"
24,South Carolina,"4,625,364"
25,Louisiana,"4,533,372"
26,Kentucky,"4,339,367"
27,Oregon,"3,831,074"
28,Oklahoma,"3,751,351"
29,Connecticut,"3,574,097"
30,Iowa,"3,046,355"
31,Mississippi,"2,967,297"
32,Arkansas,"2,915,918"
33,Kansas,"2,853,118"
34,Utah,"2,763,885"
35,Nevada,"2,700,551"
36,New Mexico,"2,059,179"
37,West Virginia,"1,852,994"
38,Nebraska,"1,826,341"
39,Idaho,"1,567,582"
40,Hawaii,"1,360,301"
41,Maine,"1,328,361"
42,New Hampshire,"1,316,470"
43,Rhode Island,"1,052,567"
44,Montana,"989,415"
45,Delaware,"897,934"
46,South Dakota,"814,180"
47,Alaska,"710,231"
48,North Dakota,"672,591"
49,Vermont,"625,741"
50,"Washington, D. C.","601,723"
51,Wyoming,"563,626"

 

StateAbbreviations.csv

State Name,State
ALABAMA,AL
ALASKA,AK
ARIZONA,AZ
ARKANSAS,AR
CALIFORNIA,CA
COLORADO,CO
CONNECTICUT,CT
DELAWARE,DE
FLORIDA,FL
GEORGIA,GA
HAWAII,HI
IDAHO,ID
ILLINOIS,IL
INDIANA,IN
IOWA,IA
KANSAS,KS
KENTUCKY,KY
LOUISIANA,LA
MAINE,ME
MARYLAND,MD
MASSACHUSETTS,MA
MICHIGAN,MI
MINNESOTA,MN
MISSISSIPPI,MS
MISSOURI,MO
MONTANA,MT
NEBRASKA,NE
NEVADA,NV
NEW HAMPSHIRE,NH
NEW JERSEY,NJ
NEW MEXICO,NM
NEW YORK,NY
NORTH CAROLINA,NC
NORTH DAKOTA,ND
OHIO,OH
OKLAHOMA,OK
OREGON,OR
PENNSYLVANIA,PA
RHODE ISLAND,RI
SOUTH CAROLINA,SC
SOUTH DAKOTA,SD
TENNESSEE,TN
TEXAS,TX
UTAH,UT
VERMONT,VT
VIRGINIA,VA
WASHINGTON,WA
WEST VIRGINIA,WV
WISCONSIN,WI
WYOMING,WY

Create a query to import each of these as a table into your Power BI Desktop data model making sure to promote the first row as headers in each query.

 

No Code

The simplest way to solve our lookup issue doesn't involve any code but simply uses the relational capabilities of Power BI Desktop. Once both of these CSV files are imported as tables into the data model, we simply use the relationships tab to relate the two "State Name" columns to one another.

vlookup1.png

 

We can create any visualization we wish and substitute the "State" column from "StateAbbreviations" instead of the "State Name" from "StatePopulations".

vlookup2.png

 

First, one should note that when you related the two tables, Power BI figured out the relationships between the "State Name" columns even though they had different capitalization. Cool. However, we immediately see a problem. The first column in our visualization is labeled "(Blank)". This is because our population table includes "Washington DC" but our abbreviations table does not. We could fix this by putting in a lookup, but if you are using this for an "exception" list, you may only have a small number of replacement values compared to "good" values. Bottom line, you really do not have much control over the lookups when using relationships. Additional problems include.

 

First, if a lookup is inadvertently duplicated in StateAbbreviations, the entire thing breaks and it is error city. This is not a problem for our simple example, but imagine if there are a couple thousand replacement records and it could easily happen, especially if you do not control the data source. Could it be fixed by adding a "remove duplicates" step to your query. Sure, but what if you sometimes need those other replacement values? Bottom line, when you have "fuzzy" matching scenarios, using only relationships is not going to get the job done. 

 

Second, you effectively need to import all of the data into the model (assuming you have a data source that does not support DirectQuery). If your lookup table is millions of rows, this can get unwieldy and increase the overall size of your Desktop file tremendously.

 

DAX

We can build upon our relationship defined above by using DAX and the RELATED function in column formulas to gain more control.

 

In StatePopulations table, create a new custom column with the following formula:

State = RELATED(StateAbbreviations[State])

This provides the same result as our report visualization but now we have it in our data model in our fact table, which is useful for things like Sort By.

 

This also provides us the ability to wrap some logic around this lookup. For example, we can fix "Blank" values by returning the original lookup value

State2 = IF(COUNTROWS(RELATEDTABLE(StateAbbreviations)) = 0,[State Name],RELATED(StateAbbreviations[State]))

This returns the original "State Name" instead of blank for values like Washington DC.

 

If we have some known variations, we can use a formula like this one:

State3 = IF([State Name] = "Washington, D. C.","DC",RELATED(StateAbbreviations[State]))

This version provides a specific translation but uses the state abbreviations for everything else. 

 

We can even get crazy and combine these:

State4 = IF([State Name] = "Washington, D. C.","DC",IF(COUNTROWS(RELATEDTABLE(StateAbbreviations)) = 0,[State Name],RELATED(StateAbbreviations[State])))

This provides a specific translation for "Washington DC", failing that determines if an appropriate replacement exists and if not uses the original value. Whew!!

 

You can add as much logic as necessary but will get messy quickly.

 

You could create an alternate lookup table an "Enter Data" query.

 

One the Home tab of Power BI Desktop, click the Enter Data button and create the following table:

vlookup3.png

 

Now we can create a secondary relationship from StatePopulations to this table like so:vlookup4.png

 

Create a column:

 

State5 = IF(COUNTROWS(RELATEDTABLE(StateAbbreviations)) = 0,IF(COUNTROWS(RELATEDTABLE(AlternateLookup)) = 0,[State Name],RELATED(AlternateLookup[Replacement])),RELATED(StateAbbreviations[State]))

We now have a central table to control exceptions.

 

These DAX techniques help us gain more control over our lookups. We have solved the "(Blank)" issue. However, we are still relying on the 1-to-Many or 1-to-1 relationship to work and we still must import all of the lookups into our model.

 

We can eliminate our reliance on table relationships by using DAX's equivalent of VLOOKUP, LOOKUPVALUE, documented here:

https://msdn.microsoft.com/en-us/library/gg492170.aspx

 

Note however, that this still essentially requires a 1-to-1 relationship in effect as LOOKUPVALUE returns an error if a search returns multiple rows and the result column for all of those rows does not contain the same value. In addition, we are still left with the requirement to import all of our lookups into our data model.

 

Power Query "M"

We've probably taken DAX about as far as we can so let's switch to Power BI's other language, Power Query (informally known as "M"). Wonderful thing about Power BI, virtually no end to the different ways you can solve the same problem!

 

First up, can we essentially replicate the functionality that we have in DAX in Power Query? Sure we can. There's this handy little function called Table.Join that essentially replicates the functionality of a table join in SQL or our relationships in our data model. Create a new "Blank Query", go to "Advanced Editor" mode and paste in the following code:

 

let
    Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StatePopulations.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Rank", Int64.Type}, {"State Name", type text}, {"Population", Int64.Type}}),
    #"Uppercased Text" = Table.TransformColumns(#"Changed Type",{{"State Name", Text.Upper}}),
    #"Merged Queries" = Table.NestedJoin(#"Uppercased Text",{"State Name"},StateAbbreviations,{"State Name"},"NewColumn",JoinKind.LeftOuter),
    #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"State"}, {"NewColumn.State"})
in
    #"Expanded NewColumn"

This will essentially transform your "StatePopulations" query into a query that matches up the "State Name" column from both queries and returns a single table with the replacement values in a new column called "State". Nifty. Not so nifty that Power Query automagically corrects for case, hence the "#Uppercased Text" step where we transform the "State Names" column in StatePopulations to all UPPERCASE (not yelling) so that it matches with the values in our "StateAbbreviations" query. In pseudocode, "Power Query < smart as Power BI Desktop".

 

The end result though is that we have our state abbreviations in our state populations table without having to import all of our translations. More efficient storage-wise I suppose, but what about the non-state per the Constitution "Washington, D. C."? Alas, null. Sadness. Can we fix it? After a fashion, we could add a column called "State" such as:

 

= if [NewColumn.State] = null then [State Name] else [NewColumn.State]

Can we implement the same kind of "exception" logic that we had in DAX? Sure we can. Add a State1 column with the formula:

if [State Name] = "WASHINGTON, D. C." then "DC" else if[NewColumn.State] = null then [State Name] else [NewColumn.State]

Presto chango, we have our translation for "Washington, D. C.". And hey, looky there, nested if statements work in "M". Good to know!

 

Thus far, however, we are still sort of, kind of relying on the "cleanliness" of our data. What if someone adds a duplicate value to our StateAbbreviations table? Back to error city. Sadness. Must rectify. In developer terms, our lookups are "fragile". That's pronounced "Frah GEEL lay" from where I'm from. Must be Italian. Basically, we are not accounting for the "edge conditions". Sot what if we get absolutely bananas and get rid of our reliance on "relationships" all together? Can we do that and what would that look like?

 

Well, as it turns out, we are consummate BI professionals with the sheer awesomeness of Power BI at our disposal, of course we can do that. Let's start with a basic example of a non-relation based lookup and see where it goes. Create a Blank Query. Make sure to name this new query "fnLookup" before you save it!! Go to "Advanced Editor" mode and paste in the following code:

 

 

let
fnLookup = (input) =>

let

values = {

{"ALABAMA","AL"},
{"ALASKA","AK"},
{"ARIZONA","AZ"},
{"ARKANSAS","AR"},
{"CALIFORNIA","CA"},
{"COLORADO","CO"},
{"CONNECTICUT","CT"},
{"DELAWARE","DE"},
{"FLORIDA","FL"},
{"GEORGIA","GA"},
{"HAWAII","HI"},
{"IDAHO","ID"},
{"ILLINOIS","IL"},
{"INDIANA","IN"},
{"IOWA","IA"},
{"KANSAS","KS"},
{"KENTUCKY","KY"},
{"LOUISIANA","LA"},
{"MAINE","ME"},
{"MARYLAND","MD"},
{"MASSACHUSETTS","MA"},
{"MICHIGAN","MI"},
{"MINNESOTA","MN"},
{"MISSISSIPPI","MS"},
{"MISSOURI","MO"},
{"MONTANA","MT"},
{"NEBRASKA","NE"},
{"NEVADA","NV"},
{"NEW HAMPSHIRE","NH"},
{"NEW JERSEY","NJ"},
{"NEW MEXICO","NM"},
{"NEW YORK","NY"},
{"NORTH CAROLINA","NC"},
{"NORTH DAKOTA","ND"},
{"OHIO","OH"},
{"OKLAHOMA","OK"},
{"OREGON","OR"},
{"PENNSYLVANIA","PA"},
{"RHODE ISLAND","RI"},
{"SOUTH CAROLINA","SC"},
{"SOUTH DAKOTA","SD"},
{"TENNESSEE","TN"},
{"TEXAS","TX"},
{"UTAH","UT"},
{"VERMONT","VT"},
{"VIRGINIA","VA"},
{"WASHINGTON","WA"},
{"WEST VIRGINIA","WV"},
{"WISCONSIN","WI"},
{"WYOMING","WY"}

},

Result = List.First(List.Select(values, each _{0}=input)){1}

in

Result
in
fnLookup

We can now create a new column "State2" in our "StatePopulations" query with the formula:

 

=fnLookup([State Name])

Wait a minute, what is this "List" thing showing in our column? Hit the "expand" arrows in the column header. Problem solved. Hey, how about that? A lookup that is NOT dependent on any kind of rigid "relationship" between any tables! But what about "Washington, D. C."? Boo! Error.

 

Let's fix that. 

 

Change State2 to a formula of:

=try fnLookup([State Name]) otherwise 0

Then create a new column "State3" with the formula:

=if [State2] = 0 then [State Name] else [State2]

Hmm, error handling in Power Query, nifty. Sure, it might take two columns for the price of one, but useful none-the-less. We can even get down with our fancy selves and change State2's formula to be:

 

=if [State Name] = "WASHINGTON, D. C." then "DC" else if[State2] = 0 then [State Name] else [State2]

Nested if statements in Power Query...still work.

 

OK, we have apparently solved the problem of our lookups being "Fra GEEL lay" so why are you looking at me like that? Don't look at me like that, I can feel your look. You are saying to yourself. "Good job Greg, you've solved the 'Fra GEEL lay' problem but now I have my lookups in some namby pamby Power Query "M" code that I have to maintain. Fail!

 

Fair enough, we don't do "namby pamby" here, we serve up the awesomesauce...

 

Edit your "fnLookup" query and paste in the following (note that you will likely have to change the path to the CSV file in the Source line):

let fnLookup = (input) =>

let
    Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StateAbbreviations.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
    Record = Table.First(Table.SelectRows(#"Promoted Headers", each [State Name] = input)),
    Result = Record.Field(Record,"State")
in
    Result
in
    fnLookup

You will probably see some errors related to referencing external stuff, yadda, yadda, yadda. To fix that, open up your "StatePopulations" query in Advanced Editor and paste in:

 

let
    Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StatePopulations.csv"),[Delimiter=",", Columns=3, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Rank", Int64.Type}, {"State Name", type text}, {"Population", Int64.Type}}),
    #"Uppercased Text" = Table.TransformColumns(#"Changed Type",{{"State Name", Text.Upper}}),
    #"Added Custom4" = Table.AddColumn(#"Uppercased Text", "State5", each fnLookup([State Name]))
in
    #"Added Custom4"

This clears out all of the various columns we have been using during testing and replaces them with our clean new function which does not require us to keep that list of lookup and replacement values in the query itself, but rather in some source that we have defined (in this case a CSV file).

 

Uh oh, back to "Error" for "Washington D.C.", we already know how to fix that (see above) so I won't belabor the point here.
 
So, let's take stock of where we are. First, using "M" code, we have solved the problem with having to store all of the lookup and replacement values in our data model. Second, we know that we have some logical control over the process of lookups and replacements to account for lookups that do not exist and/or specific exceptions that we want to enforce. Granted, this method is going to put more stress on our source system in terms of data access as each function call is presumably going to hit our data source but such is the price for saving space in our data model and ultimate control over our lookups.
 
However, make no mistake, we can make yet another refinement to our lookup code. Currently, the query we have for "fnLookup" returns ALL of the data from the lookup source and THEN refines it down to the specific value we are looking for. That's a bit inefficient. We know that we want a single value from the lookup source, can we query the source in such a way that we only return the particular value we are interested in from the source when we query it.
 
Stop me if you have heard this one before. Of course we can.
 
Replace the code in "fnLookup" with the following:

 

let fnLookup = (input) =>

let
    Source = Csv.Document(File.Contents("C:\temp\powerbi\vlookup\StateAbbreviations.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type"),
    Record = Table.First(Table.SelectRows(#"Promoted Headers", each ([State Name] = input))),
    Result = Record.Field(Record,"State")
in
    Result
in
    fnLookup
 

Now, we are pre-filtering the return from our function call such that we are not grabbing all of the values and THEN filtering but only grabbing the values from the data source that we care about. 

 

Conclusion

While it is impossible to cover every conceivable lookup scenario you might face, this article covers the vast majority of the basic techniques and building blocks for doing lookups in Power BI.

Comments

Tracking counter

 

Hi Seth,

Very nice read 🙂

 

Re performance issues cause "...each function call is presumably going to hit our data source but such is the price for saving space in our data model and ultimate control over our lookups...": This is not necessarily the case.

 

If you:

1) Split the access to the datasource from your function and

2) Buffer it

there won't be repeated calls to your datasource and the query would be much faster. So just create a buffered "staging query" where you access your datasource like (called "StateAbbreviations"):

 

let
    Source = Csv.Document(File.Contents("C:\temp\powerbi\vlooku​p\StateAbbreviations.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type")
in
Table.Buffer(#"Promoted Headers")

 

 

Then your speedy function would be this:

 

let fnLookup = (input) =>
let
    Record = Table.First(Table.SelectRows(StateAbbreviations, each ([State Name] = input))),
    Result = Record.Field(Record,"State")
in
Result
in 
fnLookup

 

 

Seth,

 

This is groundbreaking for people up-skilling away from traditional +20 year formulas into DAX.

A BIG thank you for sharing and collaborating to the huge Power BI community.

 

Chris Lira

I have to post this for the sake of Greg, and others that want to give him some great feedback. "smoupre" s name is Greg, and this is a fantastic write up. My name is Seth, and I did not write this 🙂

Anonymous

My relationship with table been done as many to many

 

and when i out a matrix table in the dashboard its showing correctly from both tables and the realtionship is working

 

 

but when i used the related() to get the value from another table to this its not working.... can you tell me what am i doing wrong?

 

thanks

Anonymous

This was very helpful, thank you.  Any chance you can give a quick update with an example of using two criteria for the look up.  For example, I'm trying to look up tiered pricing from a pricing list containing multiple vendors.  There is AcmeCo and SmithCo.  Both have different unit pricing based on order quantity: 0 - 1,000 units at $10 / unit,   1,001 - 2,000 units at $7 / unit,  and so on.  Using our purchasing data, I'd like to pull in the contracted unit price rates.  So I need to look up by product no and then also by vendor and then use the order qty to find the correct price.  What is a clever way of building this in "M"?

whats the best way to do a vlookup/index match when there isnt a direct relationship in the tables.

 

1. just do a indexmatch on the raw excel sheet and bring that column into PBI

2. use lookup function on dax ?

 

i also want to connect this to a live database so the report updates weekly. any disadvance of doing the vlookup or index match on the raw excel ?