Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
freelensia
Advocate II
Advocate II

Combine similar columns into one when importing multiple CSVs

Hi,

 

I am importing multiple CSVs from a folder with a function like this:

(FolderPath as text, SingleKeyword as text) =>
let
//Open the folder 
ShowFiles = Folder.Files(FolderPath),
//Filter by DateText
KeepOnlyDateText = Table.SelectRows(ShowFiles, each Text.EndsWith([Folder Path], FolderPath) and Text.Contains([Name], SingleKeyword) = true),
//Get ContentCol
AddContent = Table.AddColumn(KeepOnlyDateText, "ContentTbl", each Table.PromoteHeaders(Csv.Document([Content],[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv])))
in
AddContent

These CSVs have 2 columns in the end that look like this:

<site-url> <language-region> Top Rank
<site-url> <language-region> Top Ranking URL

(see below)

PQ SS.png

 

When expanding the columns of these CSV files, these final 2 cols will repeat for each site-url and lang-region combinations.

 

Is there a way to tell PQ to ignore the first 2 words of each column name, and consolidate all into just 2 columns:

Top Rank
Top Ranking URL

@ImkeF @v-juanli-msft @Nolock ?

Thanks!

11 REPLIES 11
freelensia
Advocate II
Advocate II

Yes tks @Mariusz if you could kindly add the line in bold below to your original answer for the community's sake before I accept your solution.

 

let
GetFilterCSVsBySingleKeyword = FilterCSVsBySingleKeyword(RangeValue("ImportFolder"),RangeValue("KeywordsBySite")),
CallCombineTopRankCols = Table.AddColumn(GetFilterCSVsBySingleKeyword, "NewContentTbl", each CombineTopRankCols([ContentTbl])),
ExpandNewContentTbl = Table.ExpandTableColumn(CallCombineTopRankCols, "NewContentTbl", {"Top Rank", "Top Ranking URL"}, {"Top Rank", "Top Ranking URL"})
in
ExpandNewContentTbl
Mariusz
Community Champion
Community Champion

Hi @freelensia 

 

You can try function like below as an alternative to @Nolock 

( tbl as table) => let
    Source = Table.ColumnNames ( tbl ),
    topRank = { List.First( List.Select( Source, each Text.EndsWith( _, "Top Rank" ) ) ), "Top Rank" },
    topRankingURL = { List.First( List.Select( Source, each Text.EndsWith( _, "Top Ranking URL" ) ) ), "Top Ranking URL" },
    return = Table.RenameColumns(tbl, { topRank, topRankingURL } ) 
in
    return

 

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

Please feel free to connect with me.
Mariusz Repczynski




Thanks @Mariusz . I tried your approach but I got error:

Expression.Error: We cannot convert the value "" to type Function.
Details:
    Value=
    Type=Type

See code below.

let
    GetFilterCSVsBySingleKeyword = FilterCSVsBySingleKeyword(RangeValue("ImportFolder"),"keywords_by_site"),
    ExpandContentTbl = Table.ExpandTableColumn(GetFilterCSVsBySingleKeyword, "ContentTbl", {"Keyword", "Min Volume", "Max Volume", "Difficulty", "blarlo.com en-US Top Rank", "blarlo.com en-US Top Ranking URL", "cadencetranslate.com en-US Top Rank", "cadencetranslate.com en-US Top Ranking URL", "crowdworks.jp en-US Top Rank", "crowdworks.jp en-US Top Ranking URL", "expertrans.com en-US Top Rank", "expertrans.com en-US Top Ranking URL", "expotor.com en-US Top Rank", "expotor.com en-US Top Ranking URL", "freeiva.com en-US Top Rank", "freeiva.com en-US Top Ranking URL", "freelensia.com en-US Top Rank", "freelensia.com en-US Top Ranking URL", "interpreter.com en-US Top Rank", "interpreter.com en-US Top Ranking URL", "interpreter.io en-US Top Rank", "interpreter.io en-US Top Ranking URL", "kotobaito.com en-US Top Rank", "kotobaito.com en-US Top Ranking URL", "languagers.com en-US Top Rank", "languagers.com en-US Top Ranking URL", "proz.com en-US Top Rank", "proz.com en-US Top Ranking URL", "scheduleinterpreter.com en-US Top Rank", "scheduleinterpreter.com en-US Top Ranking URL", "tanner.vn en-US Top Rank", "tanner.vn en-US Top Ranking URL", "theinterpreterdirectory.com en-US Top Rank", "theinterpreterdirectory.com en-US Top Ranking URL", "tikktalk.com en-US Top Rank", "tikktalk.com en-US Top Ranking URL", "transperfect.com en-US Top Rank", "transperfect.com en-US Top Ranking URL", "traveloco.jp en-US Top Rank", "traveloco.jp en-US Top Ranking URL", "tubudd.com en-US Top Rank", "tubudd.com en-US Top Ranking URL", "tutoroo.co en-US Top Rank", "tutoroo.co en-US Top Ranking URL", "viettranslators.com en-US Top Rank", "viettranslators.com en-US Top Ranking URL"}, {"Keyword", "Min Volume", "Max Volume", "Difficulty", "blarlo.com en-US Top Rank", "blarlo.com en-US Top Ranking URL", "cadencetranslate.com en-US Top Rank", "cadencetranslate.com en-US Top Ranking URL", "crowdworks.jp en-US Top Rank", "crowdworks.jp en-US Top Ranking URL", "expertrans.com en-US Top Rank", "expertrans.com en-US Top Ranking URL", "expotor.com en-US Top Rank", "expotor.com en-US Top Ranking URL", "freeiva.com en-US Top Rank", "freeiva.com en-US Top Ranking URL", "freelensia.com en-US Top Rank", "freelensia.com en-US Top Ranking URL", "interpreter.com en-US Top Rank", "interpreter.com en-US Top Ranking URL", "interpreter.io en-US Top Rank", "interpreter.io en-US Top Ranking URL", "kotobaito.com en-US Top Rank", "kotobaito.com en-US Top Ranking URL", "languagers.com en-US Top Rank", "languagers.com en-US Top Ranking URL", "proz.com en-US Top Rank", "proz.com en-US Top Ranking URL", "scheduleinterpreter.com en-US Top Rank", "scheduleinterpreter.com en-US Top Ranking URL", "tanner.vn en-US Top Rank", "tanner.vn en-US Top Ranking URL", "theinterpreterdirectory.com en-US Top Rank", "theinterpreterdirectory.com en-US Top Ranking URL", "tikktalk.com en-US Top Rank", "tikktalk.com en-US Top Ranking URL", "transperfect.com en-US Top Rank", "transperfect.com en-US Top Ranking URL", "traveloco.jp en-US Top Rank", "traveloco.jp en-US Top Ranking URL", "tubudd.com en-US Top Rank", "tubudd.com en-US Top Ranking URL", "tutoroo.co en-US Top Rank", "tutoroo.co en-US Top Ranking URL", "freelensia.com en-US Top Rank", "freelensia.com en-US Top Ranking URL"}),
    CallCombineTopRankCols = CombineTopRankCols(ExpandContentTbl),
    DelCols = Table.RemoveColumns(CallCombineTopRankCols,{"Content", "Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"})
in
    DelCols

 

@Nolock can you explain in more detail how to I can use your solution in my steps above? The third step is @Mariusz solution so you can kindly replace it with the code you are suggesting, it would be much appreciated.

Hi @freelensia 

 

Have you invoked the function in the right place?

image.png

Result below.

image.png

 

Please let me know if you still cant get it to work.

 

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

Please feel free to connect with me.
Mariusz Repczynski

 

Mariusz
Community Champion
Community Champion

Hi @freelensia 

 

is it working for you now?

 

Best Regards,
Mariusz

Please feel free to connect with me.
Mariusz Repczynski

 

Hi @freelensia,

wait a sec. I think we have forgotten an important fact.

We are trying to map many columns to 2. It means that we need either to remove N-2 columns OR aggregate all Top Rank columns and all Top Ranking URL columns. Quore: "Is there a way to tell PQ to ignore the first 2 words of each column name, and consolidate all into just 2 columns:" Or do I understand it wrong?

If we have to aggregate, let me know what aggregation function we should use. A Sum?

Hi @Nolock there is no need to aggregate anything. I just need to write out the records in the same order as they appear in the CSV files, however the column titles need to be consolidated into Top Rank and Top Rank URL.

 

In fact, this data contains relevant keywords related to interpreters / interpretation, and returns that most relevant sub-page of a particular domain for that keyword. So for example, if I have:

 

Site            Keyword                     freelensia.com Top Rank  freelensia.com Top Rank URL                   aaa.com Top Rank aaa.com  Top Rank URL
freelensia.com  find interpreters           23                       freelensia.com/blog/topics/Finding-an-interpreter/articles/How-to-find-an-interpreter%3F
freelensia.com reserve interpreters online 1 freelensia.com
aaa.com find tour guides in tokyo 12 aaa.com/xxx
aaa.com simultaneous interpretation tips 5 aaa.com/yyy
...

 

I want it to be like this:

Site            Keyword                          Top Rank  Top Rank URL
freelensia.com  find interpreters                23        freelensia.com/blog/topics/Finding-an-interpreter/articles/How-to-find-an-interpreter%3F
freelensia.com reserve interpreters online 1 freelensia.com
aaa.com find tour guides in tokyo 12 aaa.com/xxx
aaa.com simultaneous interpretation tips 5 aaa.com/yyy
...

 

 

Anonymous
Not applicable

This is not as sophisticated as the solutions previously mentioned but this would be my naive approach. Based on the screenshot with content columns, I would expand those columns, add index column to order rows(optional), unpivot the columns with text to be removed, split by number of characters (looks like en-US precedes the URL and rank info), delete the column with unwanted text , pivot  the column with Top Rank and URL,  then top rank and url should be in your headers with values below.  Columns consolidated. 

 

Thanks for your suggestion, but the columns will change in the future (more domains added), so I need a programmatic approach to this.

Nolock
Resident Rockstar
Resident Rockstar

Hi @freelensia,

you can redefine the problem as removing HTML tags from a string. You can do that with the PQ function Html.Table and then take the first value of the table with Table.FirstValue.

An example of using Html.Table:

 

Html.Table("<site-url> <language-region> Top Rank", {{"text",":root"}})

Capture.PNG

-----

EDIT: This is WRONG, I haven't seen the image in the original post, maybe not loaded, I don't know.

Nolock
Resident Rockstar
Resident Rockstar

Hi @freelensia,

there is also another option I've forgotten to mention. 

@ImkeF has written a handy PQ query which removes HTML tags from a string: https://github.com/ImkeF/M/blob/master/Library/Text.RemoveHtmlTags.pq.

 

-----

EDIT: This is WRONG, I haven't seen the image in the original post, maybe not loaded, I don't know.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors