cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
teng1
Regular Visitor

Get Notion databases into power bi using python script

Dear,

 

I created a generic python script and it is able to run and return results in python IDE for various databases (dynamic columns and data types). The script just loop through a list of database id and print out the results. 

 

When i put that script into power bi, it only print out the last record.

May i know how do i get different databases into different datasets using loop or any other methods ?

 

my script:

 

import requests
import pandas as pd

#pls use your own token
token ='secret_******'

payload_dname = {
    "filter": {
        "value": "database",
        "property": "object"
    },
    "page_size": 100
}


headers = {
    "Authorization": f"{token}",
    "Notion-Version": "2022-02-22",
    "Content-Type": "application/json"
}


class NotionSync:
    def __init__(self):
        pass


    # search database name
    def  notion_search(self,integration_token = token):
        url = f"https://api.notion.com/v1/search"
        response = requests.post(url, json=payload_dname, headers=headers)

        if response.status_code != 200:
            return 'Error: ' + str(response.status_code)
            exit(0)
        else:
            return response.json()

    # query database details
    def  notion_db_details(self,database_id,integration_token = token):
        url = f"https://api.notion.com/v1/databases/" + database_id + "/query"
        response = requests.post(url, headers=headers)

        if response.status_code != 200:
            return 'Error: ' + str(response.status_code)
            exit(0)
        else:
            return response.json()

    # to get databases id and name
    def get_databases(self,data_json):
        databaseinfo = {}
        databaseinfo["database_id"] = [data_json["results"][i]["id"].replace("-","")
                                                for i in range(len(data_json["results"])) ]

        databaseinfo["database_name"] = [data_json["results"][i]["title"][0]["plain_text"]
                                                  if data_json["results"][i]["title"]
                                                  else ""
                                                  for i in range(len(data_json["results"])) ]

        databaseinfo["url"] = [ data_json["results"][i]["url"]
                                         if data_json["results"][i]["url"]
                                         else ""
                                         for i in range(len(data_json["results"])) ]
        return databaseinfo


    # to get column title of the table
    def get_tablecol_titles(self,data_json):
        return list(data_json["results"][0]["properties"].keys())
    
    # to get column data type for processing by type due to data structure is different by column type
    def get_tablecol_type(self,data_json,columns_title):
        type_data = {}
        for t in columns_title:
            type_data[t] = data_json["results"][0]["properties"][t]["type"]
        return type_data

    # to get table data by column type
    def get_table_data(self,data_json,columns_type):
        table_data = {}
        for k, v in columns_type.items():
            # to check column type and process by type
            if v in ["checkbox","number","email","phone_number"]:
                table_data[k] = [ data_json["results"][i]["properties"][k][v]
                                    if data_json["results"][i]["properties"][k][v]
                                    else ""
                                    for i in range(len(data_json["results"]))]
            elif v == "date":
                table_data[k] = [ data_json["results"][i]["properties"][k]["date"]["start"]
                                    if data_json["results"][i]["properties"][k]["date"]
                                    else ""
                                    for i in range(len(data_json["results"])) ]
            elif v == "rich_text" or v == 'title':
                table_data[k] = [ data_json["results"][i]["properties"][k][v][0]["plain_text"]
                                    if data_json["results"][i]["properties"][k][v]
                                    else ""
                                    for i in range(len(data_json["results"])) ]
            elif v == "files":
                table_data[k + "_FileName"] = [ data_json["results"][i]["properties"][k][v][0]["name"]
                                                if data_json["results"][i]["properties"][k][v]
                                                else ""
                                                for i in range(len(data_json["results"])) ]
                table_data[k + "_FileUrl"] = [ data_json["results"][i]["properties"][k][v][0]["file"]["url"]
                                           if data_json["results"][i]["properties"][k][v]
                                                else ""
                                           for i in range(len(data_json["results"])) ]
            elif v == "select":
                table_data[k] = [data_json["results"][i]["properties"][k][v]["name"]
                                    if data_json["results"][i]["properties"][k][v]
                                    else ""
                                    for i in range(len(data_json["results"]))]
            elif v == "people":
                table_data[k + "_Name"] = [ [data_json["results"][i]["properties"][k][v][j]["name"]
                                                if data_json["results"][i]["properties"][k][v]
                                                # to check if key 'name' exists in the list
                                                and "name" in data_json["results"][i]["properties"][k][v][j].keys()
                                                else ""
                                                for j in range(len(data_json["results"][i]["properties"][k][v]))]
                                                for i in range(len(data_json["results"])) ]
            elif v == "multi_select":
                table_data[k] = [ [data_json["results"][i]["properties"][k][v][j]["name"]
                                  if data_json["results"][i]["properties"][k][v]
                                  else ""
                                  for j in range(len(data_json["results"][i]["properties"][k][v]))]
                                  for i in range(len(data_json["results"])) 
                                ]

        return table_data    


if __name__=='__main__':
    nsync = NotionSync()

    # to search all databases.
    data = nsync.notion_search()

    # to get database id and name.
    dbid_name = nsync.get_databases(data)

    #convert dictionary to dataframe.
    df = pd.DataFrame.from_dict(dbid_name)

    # convert to bool and then drop record with empty databasae name.
    df = df[df['database_name'].astype(bool)]
    print (df)


    # to loop through database id and get the database details.
    for d in dbid_name["database_id"]:
        # notion given another API to get the details of databases by database id. search API does not return databases details.
        dbdetails = nsync.notion_db_details(d)

        # get column title
        columns_title = nsync.get_tablecol_titles(dbdetails)

        # get column type
        columns_type = nsync.get_tablecol_type(dbdetails,columns_title)

        # get table data
        table_data = nsync.get_table_data(dbdetails,columns_type)

        #convert dictionary to dataframe
        df1 = pd.DataFrame.from_dict(table_data)
		print (df1)

 

 

 

Thanks and regards

1 ACCEPTED SOLUTION
teng1
Regular Visitor

Kinda found the answer. Power BI will import different databases results by variable name. In other words, change the variable name, in the loop, for each and every databases:

 

this

#convert dictionary to dataframe
df1 = pd.DataFrame.from_dict(table_data)
print (df1)

change to

#convert dictionary to dataframe
globals()[f"df{d}"] = pd.DataFrame.from_dict(table_data)
print (globals()[f"df{d}"])

 

Above changes will allow to import different databases into different datases when running the script to import data for the first time. If there is new databases added to notion page after the first run of the python script, new databases will not be added automatically as new datasets. We cannot simply click on the refresh button to hope that new datasets will be added automatically.

View solution in original post

2 REPLIES 2
teng1
Regular Visitor

Kinda found the answer. Power BI will import different databases results by variable name. In other words, change the variable name, in the loop, for each and every databases:

 

this

#convert dictionary to dataframe
df1 = pd.DataFrame.from_dict(table_data)
print (df1)

change to

#convert dictionary to dataframe
globals()[f"df{d}"] = pd.DataFrame.from_dict(table_data)
print (globals()[f"df{d}"])

 

Above changes will allow to import different databases into different datases when running the script to import data for the first time. If there is new databases added to notion page after the first run of the python script, new databases will not be added automatically as new datasets. We cannot simply click on the refresh button to hope that new datasets will be added automatically.

amitchandak
Super User
Super User

@teng1 , The way you are doing you can add one table at a time. You need to explore Power bi cmdlets, Power bi Rest APIS, and Power BI XMLA endpoint

 

https://docs.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools

https://docs.microsoft.com/en-us/powershell/power-bi/overview?view=powerbi-ps



Power BI Features || Datamarts: https://youtu.be/8tskWsJTEpg || Field Parameters : https://youtu.be/lqF3Wa1FllE?t=70
Time Intelligence Decoded : https://youtu.be/aU2aKbnHuWs&t=145s
Did I answer your question? Mark my post as a solution! Appreciate your Kudos !! Proud to be a Super User! !!
Dashboard of My Blogs !! Connect on Linkedin !! Subscribe to my youtube Channel !!
Want To Learn Power BI | Beginners !! Advance Concepts !! Power BI For Tableau User !!

Helpful resources

Announcements
August 2022 update 768x460.jpg

Check it Out!

Click here to learn more about the August 2022 updates!

August 1 episode 9_no_dates 768x460.jpg

The Power BI Community Show

Watch the playback when Priya Sathy and Charles Webb discuss Datamarts! Kelly also shares Power BI Community updates.

Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Top Solution Authors
Top Kudoed Authors