Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi,
I am importing multiple CSVs from a folder with a function like this:
(FolderPath as text, SingleKeyword as text) => let //Open the folder ShowFiles = Folder.Files(FolderPath), //Filter by DateText KeepOnlyDateText = Table.SelectRows(ShowFiles, each Text.EndsWith([Folder Path], FolderPath) and Text.Contains([Name], SingleKeyword) = true), //Get ContentCol AddContent = Table.AddColumn(KeepOnlyDateText, "ContentTbl", each Table.PromoteHeaders(Csv.Document([Content],[Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]))) in AddContent
These CSVs have 2 columns in the end that look like this:
<site-url> <language-region> Top Rank <site-url> <language-region> Top Ranking URL
(see below)
When expanding the columns of these CSV files, these final 2 cols will repeat for each site-url and lang-region combinations.
Is there a way to tell PQ to ignore the first 2 words of each column name, and consolidate all into just 2 columns:
Top Rank Top Ranking URL
@ImkeF @v-juanli-msft @Nolock ?
Thanks!
Yes tks @Mariusz if you could kindly add the line in bold below to your original answer for the community's sake before I accept your solution.
let GetFilterCSVsBySingleKeyword = FilterCSVsBySingleKeyword(RangeValue("ImportFolder"),RangeValue("KeywordsBySite")), CallCombineTopRankCols = Table.AddColumn(GetFilterCSVsBySingleKeyword, "NewContentTbl", each CombineTopRankCols([ContentTbl])), ExpandNewContentTbl = Table.ExpandTableColumn(CallCombineTopRankCols, "NewContentTbl", {"Top Rank", "Top Ranking URL"}, {"Top Rank", "Top Ranking URL"}) in ExpandNewContentTbl
Hi @freelensia
You can try function like below as an alternative to @Nolock
( tbl as table) => let Source = Table.ColumnNames ( tbl ), topRank = { List.First( List.Select( Source, each Text.EndsWith( _, "Top Rank" ) ) ), "Top Rank" }, topRankingURL = { List.First( List.Select( Source, each Text.EndsWith( _, "Top Ranking URL" ) ) ), "Top Ranking URL" }, return = Table.RenameColumns(tbl, { topRank, topRankingURL } ) in return
Thanks @Mariusz . I tried your approach but I got error:
Expression.Error: We cannot convert the value "" to type Function. Details: Value= Type=Type
See code below.
let GetFilterCSVsBySingleKeyword = FilterCSVsBySingleKeyword(RangeValue("ImportFolder"),"keywords_by_site"), ExpandContentTbl = Table.ExpandTableColumn(GetFilterCSVsBySingleKeyword, "ContentTbl", {"Keyword", "Min Volume", "Max Volume", "Difficulty", "blarlo.com en-US Top Rank", "blarlo.com en-US Top Ranking URL", "cadencetranslate.com en-US Top Rank", "cadencetranslate.com en-US Top Ranking URL", "crowdworks.jp en-US Top Rank", "crowdworks.jp en-US Top Ranking URL", "expertrans.com en-US Top Rank", "expertrans.com en-US Top Ranking URL", "expotor.com en-US Top Rank", "expotor.com en-US Top Ranking URL", "freeiva.com en-US Top Rank", "freeiva.com en-US Top Ranking URL", "freelensia.com en-US Top Rank", "freelensia.com en-US Top Ranking URL", "interpreter.com en-US Top Rank", "interpreter.com en-US Top Ranking URL", "interpreter.io en-US Top Rank", "interpreter.io en-US Top Ranking URL", "kotobaito.com en-US Top Rank", "kotobaito.com en-US Top Ranking URL", "languagers.com en-US Top Rank", "languagers.com en-US Top Ranking URL", "proz.com en-US Top Rank", "proz.com en-US Top Ranking URL", "scheduleinterpreter.com en-US Top Rank", "scheduleinterpreter.com en-US Top Ranking URL", "tanner.vn en-US Top Rank", "tanner.vn en-US Top Ranking URL", "theinterpreterdirectory.com en-US Top Rank", "theinterpreterdirectory.com en-US Top Ranking URL", "tikktalk.com en-US Top Rank", "tikktalk.com en-US Top Ranking URL", "transperfect.com en-US Top Rank", "transperfect.com en-US Top Ranking URL", "traveloco.jp en-US Top Rank", "traveloco.jp en-US Top Ranking URL", "tubudd.com en-US Top Rank", "tubudd.com en-US Top Ranking URL", "tutoroo.co en-US Top Rank", "tutoroo.co en-US Top Ranking URL", "viettranslators.com en-US Top Rank", "viettranslators.com en-US Top Ranking URL"}, {"Keyword", "Min Volume", "Max Volume", "Difficulty", "blarlo.com en-US Top Rank", "blarlo.com en-US Top Ranking URL", "cadencetranslate.com en-US Top Rank", "cadencetranslate.com en-US Top Ranking URL", "crowdworks.jp en-US Top Rank", "crowdworks.jp en-US Top Ranking URL", "expertrans.com en-US Top Rank", "expertrans.com en-US Top Ranking URL", "expotor.com en-US Top Rank", "expotor.com en-US Top Ranking URL", "freeiva.com en-US Top Rank", "freeiva.com en-US Top Ranking URL", "freelensia.com en-US Top Rank", "freelensia.com en-US Top Ranking URL", "interpreter.com en-US Top Rank", "interpreter.com en-US Top Ranking URL", "interpreter.io en-US Top Rank", "interpreter.io en-US Top Ranking URL", "kotobaito.com en-US Top Rank", "kotobaito.com en-US Top Ranking URL", "languagers.com en-US Top Rank", "languagers.com en-US Top Ranking URL", "proz.com en-US Top Rank", "proz.com en-US Top Ranking URL", "scheduleinterpreter.com en-US Top Rank", "scheduleinterpreter.com en-US Top Ranking URL", "tanner.vn en-US Top Rank", "tanner.vn en-US Top Ranking URL", "theinterpreterdirectory.com en-US Top Rank", "theinterpreterdirectory.com en-US Top Ranking URL", "tikktalk.com en-US Top Rank", "tikktalk.com en-US Top Ranking URL", "transperfect.com en-US Top Rank", "transperfect.com en-US Top Ranking URL", "traveloco.jp en-US Top Rank", "traveloco.jp en-US Top Ranking URL", "tubudd.com en-US Top Rank", "tubudd.com en-US Top Ranking URL", "tutoroo.co en-US Top Rank", "tutoroo.co en-US Top Ranking URL", "freelensia.com en-US Top Rank", "freelensia.com en-US Top Ranking URL"}), CallCombineTopRankCols = CombineTopRankCols(ExpandContentTbl), DelCols = Table.RemoveColumns(CallCombineTopRankCols,{"Content", "Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"}) in DelCols
@Nolock can you explain in more detail how to I can use your solution in my steps above? The third step is @Mariusz solution so you can kindly replace it with the code you are suggesting, it would be much appreciated.
Hi @freelensia
Have you invoked the function in the right place?
Result below.
Please let me know if you still cant get it to work.
Hi @freelensia,
wait a sec. I think we have forgotten an important fact.
We are trying to map many columns to 2. It means that we need either to remove N-2 columns OR aggregate all Top Rank columns and all Top Ranking URL columns. Quore: "Is there a way to tell PQ to ignore the first 2 words of each column name, and consolidate all into just 2 columns:" Or do I understand it wrong?
If we have to aggregate, let me know what aggregation function we should use. A Sum?
Hi @Nolock there is no need to aggregate anything. I just need to write out the records in the same order as they appear in the CSV files, however the column titles need to be consolidated into Top Rank and Top Rank URL.
In fact, this data contains relevant keywords related to interpreters / interpretation, and returns that most relevant sub-page of a particular domain for that keyword. So for example, if I have:
Site Keyword freelensia.com Top Rank freelensia.com Top Rank URL aaa.com Top Rank aaa.com Top Rank URL freelensia.com find interpreters 23 freelensia.com/blog/topics/Finding-an-interpreter/articles/How-to-find-an-interpreter%3F
freelensia.com reserve interpreters online 1 freelensia.com
aaa.com find tour guides in tokyo 12 aaa.com/xxx
aaa.com simultaneous interpretation tips 5 aaa.com/yyy
...
I want it to be like this:
Site Keyword Top Rank Top Rank URL freelensia.com find interpreters 23 freelensia.com/blog/topics/Finding-an-interpreter/articles/How-to-find-an-interpreter%3F
freelensia.com reserve interpreters online 1 freelensia.com
aaa.com find tour guides in tokyo 12 aaa.com/xxx
aaa.com simultaneous interpretation tips 5 aaa.com/yyy
...
This is not as sophisticated as the solutions previously mentioned but this would be my naive approach. Based on the screenshot with content columns, I would expand those columns, add index column to order rows(optional), unpivot the columns with text to be removed, split by number of characters (looks like en-US precedes the URL and rank info), delete the column with unwanted text , pivot the column with Top Rank and URL, then top rank and url should be in your headers with values below. Columns consolidated.
Thanks for your suggestion, but the columns will change in the future (more domains added), so I need a programmatic approach to this.
Hi @freelensia,
you can redefine the problem as removing HTML tags from a string. You can do that with the PQ function Html.Table and then take the first value of the table with Table.FirstValue.
An example of using Html.Table:
Html.Table("<site-url> <language-region> Top Rank", {{"text",":root"}})
-----
EDIT: This is WRONG, I haven't seen the image in the original post, maybe not loaded, I don't know.
Hi @freelensia,
there is also another option I've forgotten to mention.
@ImkeF has written a handy PQ query which removes HTML tags from a string: https://github.com/ImkeF/M/blob/master/Library/Text.RemoveHtmlTags.pq.
-----
EDIT: This is WRONG, I haven't seen the image in the original post, maybe not loaded, I don't know.