Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Cyprien-Bariant
Frequent Visitor

Resolving Messenger's mojibake in Power Query

Hello Power BI Community,
Thank you for all the help you gave me through this forum.

As you might know, Facebook allows its users to download their personal data since 2018.

They provide JSON files that recounts all the messages & reactions that were sent in a given Messenger conversation.

The problem is that they didn't correctly encode the data, and that mojibake is everywhere in the files.
Example:

 

{
      "sender_name": "Will Smith",
      "timestamp_ms": 1573564027017,
      "content": "D\u00c3\u0089J\u00c3\u0080 VU",
      "reactions": [
        {
          "reaction": "\u00f0\u009f\u0091\u008d",
          "actor": "Cyprien Bariant"
        }
      ],
      "type": "Generic",
      "is_unsent": false
    },

 

 is meant to be written as:

Cyprien-Bariant_0-1621091113461.png

 or as:

 

{
      "sender_name": "Will Smith",
      "timestamp_ms": 1573564027017,
      "content": "D\u00C9J\u00C0 VU",
      "reactions": [
        {
          "reaction": "\uD83D\uDC4D",
          "actor": "Cyprien Bariant"
        }
      ],
      "type": "Generic",
      "is_unsent": false
    }

 

which prints the right characters when we select "Unicode (UTF-8)" as file's origin.


In Power Query, with Messenger's data, The intended "DÉJÀ VU", prints as "DÉJÀ VU" and the intended "👍" as "👍" using UTF-8 encoding.

The cause of the mojibake is explained here : https://stackoverflow.com/questions/50008296/facebook-json-badly-encoded
It was apparently originally encoded in utf-8 and then decoded in latin-1.

The only solution I found for the moment was to change every mojibaked value by hand, which is very long as 1500+ emojis are now available for reactions. It also makes the time to transform data waaaaaay longer.

I know that it is possible to solve that with some other languages like Java & Python, but my goal is to be able to input the JSON and directly get the correct encoding for my PBI report, which is more user-friendly if I share my pbix file.

If someone could be of any help, I would be very grateful for your time spent.

1 ACCEPTED SOLUTION
v-yingjl
Community Support
Community Support

Hi @Cyprien-Bariant ,

Currently in power query, you can try to create custom columns to re-encode and decode these fields and remove the previous columns, like this:

= Text.FromBinary(Text.ToBinary([content],28591),65001)

The whole query is like this:

let
    Source = Json.Document(File.Contents("C:\Users\xxx\xxx\test.json"), 1252),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded reactions" = Table.ExpandListColumn(#"Converted to Table", "reactions"),
    #"Expanded reactions1" = Table.ExpandRecordColumn(#"Expanded reactions", "reactions", {"reaction", "actor"}, {"reactions.reaction", "reactions.actor"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded reactions1",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content", type text}, {"reactions.reaction", type text}, {"reactions.actor", type text}, {"type", type text}, {"is_unsent", type logical}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "content.1", each Text.FromBinary(Text.ToBinary([content],28591),65001)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "reaction", each Text.FromBinary(Text.ToBinary([reactions.reaction],28591),65001)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"content", "reactions.reaction"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"sender_name", "timestamp_ms", "content.1", "reactions.actor", "reaction", "type", "is_unsent"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content.1", type text}, {"reactions.actor", type text}, {"reaction", type text}, {"type", type text}, {"is_unsent", type logical}})
in
    #"Changed Type1"

re.png

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-yingjl
Community Support
Community Support

Hi @Cyprien-Bariant ,

Currently in power query, you can try to create custom columns to re-encode and decode these fields and remove the previous columns, like this:

= Text.FromBinary(Text.ToBinary([content],28591),65001)

The whole query is like this:

let
    Source = Json.Document(File.Contents("C:\Users\xxx\xxx\test.json"), 1252),
    #"Converted to Table" = Table.FromRecords({Source}),
    #"Expanded reactions" = Table.ExpandListColumn(#"Converted to Table", "reactions"),
    #"Expanded reactions1" = Table.ExpandRecordColumn(#"Expanded reactions", "reactions", {"reaction", "actor"}, {"reactions.reaction", "reactions.actor"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded reactions1",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content", type text}, {"reactions.reaction", type text}, {"reactions.actor", type text}, {"type", type text}, {"is_unsent", type logical}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "content.1", each Text.FromBinary(Text.ToBinary([content],28591),65001)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "reaction", each Text.FromBinary(Text.ToBinary([reactions.reaction],28591),65001)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"content", "reactions.reaction"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"sender_name", "timestamp_ms", "content.1", "reactions.actor", "reaction", "type", "is_unsent"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"sender_name", type text}, {"timestamp_ms", Int64.Type}, {"content.1", type text}, {"reactions.actor", type text}, {"reaction", type text}, {"type", type text}, {"is_unsent", type logical}})
in
    #"Changed Type1"

re.png

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @v-yingjl , thank you very much for your answer!!!

I've been looking for an answer for a while now, and I'm so glad I finally got it!

 

 



Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors