cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
New Member

Power Query Python dataset converting columns to strings

I have a simple test script as follows:

 

import pandas as pd

dataframe_info = pd.DataFrame(dataset.dtypes)
dataframe_copy = pd.DataFrame(dataset)
dataframe_obj = pd.DataFrame(dataset.loc[:,"Object_Value"])

 

I'm finding that my table (data source table from SQL Server) which contains a variety of column types, including string, integer, real, and  binary, when placed into the "dataset" dataframe, converts the column types to all strings. I know that we can check a box to infer data types, but that doesn't work for my binary fields which all get converted to the string "System.Byte()".

 

dataframe_info shows all the correct column types, but dataframe.copy has converted all the columns to strings.

 

Why are the native types in the source table not reserved in the dataframe "dataset"? When I use Python odbc to read the same table into a dataframe, all the types are preserved, including binary.

 

Thanks in advance.

2 REPLIES 2
Highlighted
Community Support
Community Support

Re: Power Query Python dataset converting columns to strings

Hi @Chris_Wilson 

When importing with Power BI, it would reserve the original type.

As searched, it is said pd.dataframe can't read binary file correctly.

We need to find some python functions to deal with the binary type data.

As a tempory workaround, you could convert the binary type firstly in Power BI, then use python scripts.

Capture3.JPGCapture4.JPG

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Highlighted
New Member

Re: Power Query Python dataset converting columns to strings

Maggie:

 

I don't know how to attach images to a post without uploading to a URL, so I'll have to describe what I'm seeing.

 

  1. Start PBI (latest version), select "Get data" the SQL Server, enter servername (I'm using localhost\sqlexpress), database, and Import
  2. Select table then Load
  3. Click Transform data to open Power Query editor
  4. Table headings show correct field type (string, date-time, float, binary). I am aware at this point I could click "Combine files" in the binary column and PQ would be able to interpret selected binary formats like PDF.
  5. Select Transform group then Run Python script
  6. Enter this script: 
    # 'dataset' holds the input data for this script
    import pandas as pd
    dataset_types = pd.DataFrame(dataset.dtypes)
    dataset_copy = dataset.copy()
  7. The result shows dataset, dataset_copy, dataset_types. Click dataset: a new step called dataset is created as well as Changed Type (which I know is configurable)
  8. All the columns in dataset are strings and every cell of the binary column contains says "System.Byte[]"
  9. The Changed Type step does type conversions for the non-null columns, but of course doesn't fix the binary column.
  10. Interestingly, when you look at the dataet_types dataframe, the types look correct. It appears that the process of operating on the dataset dataframe screws up the types.

I don't expect PQ to decypher my binary contents (that's why I'm writing the Python code).

 

Thanks,

 

Chris

Helpful resources

Announcements
Community Blog

Community Blog

Visit our Community Blog for articles, guides, and information created by fellow community members.

Using the Community

Using the Community

Need help with the Power BI Community? Our 'Using the Community' support articles are a great place to start.

Community Summit North America

Community Summit North America

Innovate, Collaborate, Grow. The top training and networking event across the globe for Microsoft Business Applications

Power Platform 2020 release wave 2 plan

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021