cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Cyprien-Bariant
Frequent Visitor

Resolving Messenger's mojibake in Power Query

Hello Power BI Community,
Thank you for all the help you gave me through this forum.

As you might know, Facebook allows its users to download their personal data since 2018.

They provide JSON files that recounts all the messages & reactions that were sent in a given Messenger conversation.

The problem is that they didn't correctly encode the data, and that mojibake is everywhere in the files.
Example:

 

{
      "sender_name": "Will Smith",
      "timestamp_ms": 1573564027017,
      "content": "D\u00c3\u0089J\u00c3\u0080 VU",
      "reactions": [
        {
          "reaction": "\u00f0\u009f\u0091\u008d",
          "actor": "Cyprien Bariant"
        }
      ],
      "type": "Generic",
      "is_unsent": false
    },

 

 is meant to be written as:

Cyprien-Bariant_0-1621091113461.png

 or as:

 

{
      "sender_name": "Will Smith",
      "timestamp_ms": 1573564027017,
      "content": "D\u00C9J\u00C0 VU",
      "reactions": [
        {
          "reaction": "\uD83D\uDC4D",
          "actor": "Cyprien Bariant"
        }
      ],
      "type": "Generic",
      "is_unsent": false
    }

 

which prints the right characters when we select "Unicode (UTF-8)" as file's origin.


In Power Query, with Messenger's data, The intended "DÉJÀ VU", prints as "DÉJÀ VU" and the intended "👍" as "👍" using UTF-8 encoding.

The cause of the mojibake is explained here : https://stackoverflow.com/questions/50008296/facebook-json-badly-encoded
It was apparently originally encoded in utf-8 and then decoded in latin-1.

The only solution I found for the moment was to change every mojibaked value by hand, which is very long as 1500+ emojis are now available for reactions. It also makes the time to transform data waaaaaay longer.

I know that it is possible to solve that with some other languages like Java & Python, but my goal is to be able to input the JSON and directly get the correct encoding for my PBI report, which is more user-friendly if I share my pbix file.

If someone could be of any help, I would be very grateful for your time spent.

1 ACCEPTED SOLUTION
v-yingjl
Community Support
Community Support

Hi @Cyprien-Bariant ,

Currently in power query, you can try to create custom columns to re-encode and decode these fields and remove the previous columns, like this:

= Text.FromBinary(Text.ToBinary([content],28591),65001)

The whole query is like this:

let
    Source = Json.Document(File.Contents("C:\Users\xxx\xxx\test.json"), 1252),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded reactions" = Table.ExpandListColumn(#"Converted to Table", "reactions"),
    #"Expanded reactions1" = Table.ExpandRecordColumn(#"Expanded reactions", "reactions", {"reaction", "actor"}, {"reactions.reaction", "reactions.actor"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded reactions1",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content", type text}, {"reactions.reaction", type text}, {"reactions.actor", type text}, {"type", type text}, {"is_unsent", type logical}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "content.1", each Text.FromBinary(Text.ToBinary([content],28591),65001)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "reaction", each Text.FromBinary(Text.ToBinary([reactions.reaction],28591),65001)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"content", "reactions.reaction"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"sender_name", "timestamp_ms", "content.1", "reactions.actor", "reaction", "type", "is_unsent"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content.1", type text}, {"reactions.actor", type text}, {"reaction", type text}, {"type", type text}, {"is_unsent", type logical}})
in
    #"Changed Type1"

re.png

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-yingjl
Community Support
Community Support

Hi @Cyprien-Bariant ,

Currently in power query, you can try to create custom columns to re-encode and decode these fields and remove the previous columns, like this:

= Text.FromBinary(Text.ToBinary([content],28591),65001)

The whole query is like this:

let
    Source = Json.Document(File.Contents("C:\Users\xxx\xxx\test.json"), 1252),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded reactions" = Table.ExpandListColumn(#"Converted to Table", "reactions"),
    #"Expanded reactions1" = Table.ExpandRecordColumn(#"Expanded reactions", "reactions", {"reaction", "actor"}, {"reactions.reaction", "reactions.actor"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded reactions1",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content", type text}, {"reactions.reaction", type text}, {"reactions.actor", type text}, {"type", type text}, {"is_unsent", type logical}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "content.1", each Text.FromBinary(Text.ToBinary([content],28591),65001)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "reaction", each Text.FromBinary(Text.ToBinary([reactions.reaction],28591),65001)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"content", "reactions.reaction"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"sender_name", "timestamp_ms", "content.1", "reactions.actor", "reaction", "type", "is_unsent"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content.1", type text}, {"reactions.actor", type text}, {"reaction", type text}, {"type", type text}, {"is_unsent", type logical}})
in
    #"Changed Type1"

re.png

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Hi @v-yingjl , thank you very much for your answer!!!

I've been looking for an answer for a while now, and I'm so glad I finally got it!

 

 



Helpful resources

Announcements
PBI_User Group Leader_768x460.jpg

Manage your user group events

Check out the News & Announcements to learn more.

Get Ready for Power BI Dev Camp

Microsoft named a Leader in The Forrester Wave

Microsoft received the highest score of any vendor in both the strategy and current offering categories.

Get Ready for Power BI Dev Camp

Power BI Dev Camp - September 30th, 2021

Mark your calendars and join us for our next Power BI Dev Camp!

PowerPlatform 768x460.png

Microsoft Learn

Check out our new Discover Your Career Path blog post series and get all the details.

Top Kudoed Authors