Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Inconsistent CSV Update/Overwrite with Python

Context: my goal is to have two types of Excel files/CSVs:

  • Current: Always have updated information (many files)
  • Ledger: A log of past information from the above files (using date modified as date) (one file)

I got this to happen once by doing the following:

  1. Checked out this site: https://www.thebiccountant.com/2018/12/29/export-data-from-power-bi-to-csv-using-python/
  2. Pulled in Current files and combined in PowerBI
  3. Pulled in Ledger
  4. In python, checked if Current files rows were in Ledger already
    1. If they weren't, added to Ledger in PowerBI (not yet on my computer)
  5. Wrote Ledger to same filename and filepath as original Ledger (overwrite, now on my computer) using similar powerQuery and python code as the link above.

 

I know I got it to happen because my Ledger originally only had one Test row, and it now additionally has every Current row.

 

HOWEVER, it will not happen fully again. Here is what's up:

  1. I change one line of one file in Current files
  2. This DOES change that line from my direct PowerBI pull. Current Files combined is updated
  3. This DOES NOT change the output of my python code, the Ledger
  4. Therefore, the Ledger on my computer is overwritten by the same file

 

Here is my python code for reference, with some extra notes:

 

import pandas as pd
import numpy as np
import datetime

log = dataset

def un_id(series):
Returns the Unique ID of a row return ((str(series["Therapist"]) + "_" + str(series["Agency"])+ "_" + str(series["Provider Office"])).replace(" - ", "-").replace(" ", "_"))
Causes the log (Ledger) table and the prov (Current combined) table to both have IDs log.loc[:, "ID"] = log.apply(un_id, 1) prov.loc[:, "ID"] = prov.apply(un_id, 1)
Dates were taken in as strings, changes them to timestamps log.loc[:, "Date"] = log["Date"].map(pd.to_datetime) prov.loc[:, "Date"] = log["Date"].map(pd.to_datetime)
Checks if a row [in prov/Current combined] is in Ledger. Returns the series if it is def not_in_log(series): #If series is not in log, returns series. #Else, returns nothing.
Finds rows in Ledger with the same Unique ID as in prov/Current combined matches = log[log["ID"] == series["ID"]]
If none of these rows has the same date as the prov/Current combined Date, returns that row if series["Date"] not in list(matches["Date"]): return series
Goes through all prov/Current combined rows. Returns the ones that are not in Ledger new_rows = prov.apply(not_in_log, 1) new_rows = new_rows.dropna(how="all") new_rows = pd.DataFrame(new_rows)
Adds new rows to the Ledger log = log.append(new_rows, ignore_index = True)
Makes sure the Ledger has the right column names log = log.loc[:, list(prov.columns)]

Thanks in advance for the help!

2 REPLIES 2
v-juanli-msft
Community Support
Community Support

Hi @Anonymous 

It seems the first goal is achieved with your code.

  • Current: Always have updated information (many files)
  • Ledger: A log of past information from the above files (using date modified as date) (one file)

The second one means to store historic data in files, these historic files should not override.

If so, please refer to a brilliant article: Storing and using information from a dynamic data source using PBI desktop.

 

Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Anonymous
Not applicable

Thanks for the pointer!

 

I could certainly run my python in a script that I time using task scheduler; in this scenario, I wouldn't even need to go through PowerBI for the updates.

 

However, because of the data hosting/security context I'm in, it is preferable that everything stays in PowerBI (and OneDrive, where the files are stored). 

 

I'm confused because I was able to get the process to happen once (the initial ledger "fill") but it won't happen a second time to udpate.

 

I've also gotten this message which I don't know how to fix, and I think it has something to do with security (filepath removed for privacy):

 

DataSource.Error: ADO.NET: Python script error.
Traceback (most recent call last):
  File "PythonScriptWrapper.PY", line 17, in <module>
    dataset.to_csv(path, mode = 'w', index = None, header = True, quoting = csv.QUOTE_ALL, chunksize = None, decimal= '.')
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3228, in to_csv
    formatter.save()
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\formats\csvs.py", line 183, in save
    compression=self.compression,
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 399, in _get_handle
    f = open(path_or_buf, mode, encoding=encoding, newline="")
PermissionError: [Errno 13] Permission denied: 'FILEPATH/Provider_Log.csv'

Details:
    DataSourceKind=Python
    DataSourcePath=Python
    Message=Python script error.
Traceback (most recent call last):
  File "PythonScriptWrapper.PY", line 17, in <module>
    dataset.to_csv(path, mode = 'w', index = None, header = True, quoting = csv.QUOTE_ALL, chunksize = None, decimal= '.')
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\core\generic.py", line 3228, in to_csv
    formatter.save()
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\formats\csvs.py", line 183, in save
    compression=self.compression,
  File "C:\Users\kwilliams\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\common.py", line 399, in _get_handle
    f = open(path_or_buf, mode, encoding=encoding, newline="")
PermissionError: [Errno 13] Permission denied: FILEPATH/Provider_Log.csv'

    ErrorCode=-2147467259
    ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.