Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Chris_Wilson
New Member

Power Query Python dataset converting columns to strings

I have a simple test script as follows:

 

import pandas as pd

dataframe_info = pd.DataFrame(dataset.dtypes)
dataframe_copy = pd.DataFrame(dataset)
dataframe_obj = pd.DataFrame(dataset.loc[:,"Object_Value"])

 

I'm finding that my table (data source table from SQL Server) which contains a variety of column types, including string, integer, real, and  binary, when placed into the "dataset" dataframe, converts the column types to all strings. I know that we can check a box to infer data types, but that doesn't work for my binary fields which all get converted to the string "System.Byte()".

 

dataframe_info shows all the correct column types, but dataframe.copy has converted all the columns to strings.

 

Why are the native types in the source table not reserved in the dataframe "dataset"? When I use Python odbc to read the same table into a dataframe, all the types are preserved, including binary.

 

Thanks in advance.

2 REPLIES 2
v-juanli-msft
Community Support
Community Support

Hi @Chris_Wilson 

When importing with Power BI, it would reserve the original type.

As searched, it is said pd.dataframe can't read binary file correctly.

We need to find some python functions to deal with the binary type data.

As a tempory workaround, you could convert the binary type firstly in Power BI, then use python scripts.

Capture3.JPGCapture4.JPG

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Maggie:

 

I don't know how to attach images to a post without uploading to a URL, so I'll have to describe what I'm seeing.

 

  1. Start PBI (latest version), select "Get data" the SQL Server, enter servername (I'm using localhost\sqlexpress), database, and Import
  2. Select table then Load
  3. Click Transform data to open Power Query editor
  4. Table headings show correct field type (string, date-time, float, binary). I am aware at this point I could click "Combine files" in the binary column and PQ would be able to interpret selected binary formats like PDF.
  5. Select Transform group then Run Python script
  6. Enter this script: 
    # 'dataset' holds the input data for this script
    import pandas as pd
    dataset_types = pd.DataFrame(dataset.dtypes)
    dataset_copy = dataset.copy()
  7. The result shows dataset, dataset_copy, dataset_types. Click dataset: a new step called dataset is created as well as Changed Type (which I know is configurable)
  8. All the columns in dataset are strings and every cell of the binary column contains says "System.Byte[]"
  9. The Changed Type step does type conversions for the non-null columns, but of course doesn't fix the binary column.
  10. Interestingly, when you look at the dataet_types dataframe, the types look correct. It appears that the process of operating on the dataset dataframe screws up the types.

I don't expect PQ to decypher my binary contents (that's why I'm writing the Python code).

 

Thanks,

 

Chris

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors