cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Cyprien-Bariant
Frequent Visitor

Resolving Messenger's mojibake in Power Query

Hello Power BI Community,
Thank you for all the help you gave me through this forum.

As you might know, Facebook allows its users to download their personal data since 2018.

They provide JSON files that recounts all the messages & reactions that were sent in a given Messenger conversation.

The problem is that they didn't correctly encode the data, and that mojibake is everywhere in the files.
Example:

 

{
      "sender_name": "Will Smith",
      "timestamp_ms": 1573564027017,
      "content": "D\u00c3\u0089J\u00c3\u0080 VU",
      "reactions": [
        {
          "reaction": "\u00f0\u009f\u0091\u008d",
          "actor": "Cyprien Bariant"
        }
      ],
      "type": "Generic",
      "is_unsent": false
    },

 

 is meant to be written as:

Cyprien-Bariant_0-1621091113461.png

 or as:

 

{
      "sender_name": "Will Smith",
      "timestamp_ms": 1573564027017,
      "content": "D\u00C9J\u00C0 VU",
      "reactions": [
        {
          "reaction": "\uD83D\uDC4D",
          "actor": "Cyprien Bariant"
        }
      ],
      "type": "Generic",
      "is_unsent": false
    }

 

which prints the right characters when we select "Unicode (UTF-8)" as file's origin.


In Power Query, with Messenger's data, The intended "DÉJÀ VU", prints as "DÉJÀ VU" and the intended "👍" as "👍" using UTF-8 encoding.

The cause of the mojibake is explained here : https://stackoverflow.com/questions/50008296/facebook-json-badly-encoded
It was apparently originally encoded in utf-8 and then decoded in latin-1.

The only solution I found for the moment was to change every mojibaked value by hand, which is very long as 1500+ emojis are now available for reactions. It also makes the time to transform data waaaaaay longer.

I know that it is possible to solve that with some other languages like Java & Python, but my goal is to be able to input the JSON and directly get the correct encoding for my PBI report, which is more user-friendly if I share my pbix file.

If someone could be of any help, I would be very grateful for your time spent.

1 ACCEPTED SOLUTION
v-yingjl
Community Support
Community Support

Hi @Cyprien-Bariant ,

Currently in power query, you can try to create custom columns to re-encode and decode these fields and remove the previous columns, like this:

= Text.FromBinary(Text.ToBinary([content],28591),65001)

The whole query is like this:

let
    Source = Json.Document(File.Contents("C:\Users\xxx\xxx\test.json"), 1252),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded reactions" = Table.ExpandListColumn(#"Converted to Table", "reactions"),
    #"Expanded reactions1" = Table.ExpandRecordColumn(#"Expanded reactions", "reactions", {"reaction", "actor"}, {"reactions.reaction", "reactions.actor"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded reactions1",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content", type text}, {"reactions.reaction", type text}, {"reactions.actor", type text}, {"type", type text}, {"is_unsent", type logical}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "content.1", each Text.FromBinary(Text.ToBinary([content],28591),65001)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "reaction", each Text.FromBinary(Text.ToBinary([reactions.reaction],28591),65001)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"content", "reactions.reaction"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"sender_name", "timestamp_ms", "content.1", "reactions.actor", "reaction", "type", "is_unsent"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content.1", type text}, {"reactions.actor", type text}, {"reaction", type text}, {"type", type text}, {"is_unsent", type logical}})
in
    #"Changed Type1"

re.png

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-yingjl
Community Support
Community Support

Hi @Cyprien-Bariant ,

Currently in power query, you can try to create custom columns to re-encode and decode these fields and remove the previous columns, like this:

= Text.FromBinary(Text.ToBinary([content],28591),65001)

The whole query is like this:

let
    Source = Json.Document(File.Contents("C:\Users\xxx\xxx\test.json"), 1252),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded reactions" = Table.ExpandListColumn(#"Converted to Table", "reactions"),
    #"Expanded reactions1" = Table.ExpandRecordColumn(#"Expanded reactions", "reactions", {"reaction", "actor"}, {"reactions.reaction", "reactions.actor"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded reactions1",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content", type text}, {"reactions.reaction", type text}, {"reactions.actor", type text}, {"type", type text}, {"is_unsent", type logical}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "content.1", each Text.FromBinary(Text.ToBinary([content],28591),65001)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "reaction", each Text.FromBinary(Text.ToBinary([reactions.reaction],28591),65001)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"content", "reactions.reaction"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"sender_name", "timestamp_ms", "content.1", "reactions.actor", "reaction", "type", "is_unsent"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content.1", type text}, {"reactions.actor", type text}, {"reaction", type text}, {"type", type text}, {"is_unsent", type logical}})
in
    #"Changed Type1"

re.png

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Hi @v-yingjl , thank you very much for your answer!!!

I've been looking for an answer for a while now, and I'm so glad I finally got it!

 

 



Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

2021 Release Wave 2 Plan

Power Platform release plan for the 2021 release wave 2 describes all new features releasing from October 2021 through March 2022.

July 2021 Update 768x460.png

Check it out!

Click here to read more about the July 2021 Updates

Top Solution Authors
Top Kudoed Authors