Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
harsha99
Frequent Visitor

Proper way of interacting with Redshift inside private subnet in AWS

So our Redshift is out DWH located inside a private subnet in AWS. We want to connect from PowerBI cloud service to Redshift.

 

So from my previous reading and understanding from https://community.powerbi.com/t5/Integrations-with-Files-and/Connect-data-from-Amazon-AWS-Services/m... we plan to do that by launching a windows server instance in AWS VPC and installing a On Premise Data Gateway, which can access the private subnet.

 

Basically On Premise Gateway is kind of proxy to access my private DB. That sound perfectly good as my DB is not being exposed out.

 

Is this the best way of dealing this or are there any better approaches ?.

1 ACCEPTED SOLUTION

Finally i figured it out and made it working, here is my solution for people who has their redshift not available to public internet.

 

So our redshift cannot be accessible from outside, it is only available to those machine which are on same network (AWS VPC). Since it is not accessible outside, exsisting redshift connector won't work. 

 

So i launched a windows instance within the same network where redshift lives. Made a ODBC connection with redshift. installed appropriate odbc drivers on gateway.

 

Published the pbix file to PowerBI service, configured odbc gateway in powerbi service and i was able to refresh data from redshift cluster.

View solution in original post

14 REPLIES 14
v-yuezhe-msft
Employee
Employee

@harsha99,

Gateway is used to connect and refresh on-premises datasets in Power BI Service. In your scenario, as Amazon Redshift is a cloud service, gateway is not required.

Moreover, to create reports by consuming AWS Redshift data, you would  need to firstly use Amazon Redshift connector in Power BI Desktop to connect to the data source, then create reports and publish report to Power BI Service.


Reference:
https://powerbi.microsoft.com/en-us/blog/announcing-support-for-amazon-redshift-reports-in-the-power...



Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Sure if Redshift is the only source you are connecting to, you don't need a gateway.  But my understanding from my team is that when we combined OnPrem data with Redshift data in the same dataset, we now have to let the on prem data gateway handle all the connections not just the on prem ones ... which means routing all the data from redshift to the on prem data gateway and back out to powerbi.com service.  Is this correct?  Or is there another/better way? While inconvient and slower, the biggest problem is that On Prem generally means a firewall.  Our Network Admins whitelist the IP address range to get to Redshift, but the range seems to change randomly resulting in a refresh that was previously working suddenly sending a "Connection reset by peer" message because it can't get through the firewall.  My networking knowledge is slim.  How have other companies solved this?

We have reached the same conclusion within our organization.  Our Redshift cluster is also White-listed by IP address so the direct-connectivity option isn't viable for us when IP's for the Power BI Service change regularly.  We have implemented the On-premise Gateway on and EC2 instance on the same VPC as our Redshift cluster.  It would be nice if MSFT would work to provide a solution for direct Redshift connectivity that is viable as I doubt any organization has their Redshift cluster completely open to the internet.  We also constantly struggle with data set size limitations with Pro-level licensing and are continually having to trim data sets.  It would be nice if we could simply pay for larger data sets without having to jump to Enterprise licensing.  

@v-yuezhe-msft I agree with what you had mentioned, that looks absolutley true if amazon redshift has publicly accessible connection.

 

But what if Redshift is only accesible inside AWS VPC, i.e no connection to public internet. Then probably there is no way for the PowerBI service to access it right ?

 

In such cases is gatway the only option ?

@harsha99,

I am not familiar about AWS VPC, when you publish PBIX file that only contains Redshift data source to Power BI Service, does it prompt you to install gateway?

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Finally i figured it out and made it working, here is my solution for people who has their redshift not available to public internet.

 

So our redshift cannot be accessible from outside, it is only available to those machine which are on same network (AWS VPC). Since it is not accessible outside, exsisting redshift connector won't work. 

 

So i launched a windows instance within the same network where redshift lives. Made a ODBC connection with redshift. installed appropriate odbc drivers on gateway.

 

Published the pbix file to PowerBI service, configured odbc gateway in powerbi service and i was able to refresh data from redshift cluster.

Anonymous
Not applicable

Thank you very much for comming back and posting the solution @harsha99 ! This helped me today.

Anonymous
Not applicable

Hello Harsha,

I have similar Requirement :

 

1) Set up On-Premises Data Gateway between Amazon Redshift Clusters hosted in Private subnet VPC to Power BI Service .

2) Do i need to install Amazon Redshift ODBC Driver (64-bit) on gateway machine ?

any addiional steps need to set up in order to work my solution correctly.

I would love and like connect , if revert on asrinivasa576@gmail.com or 9866828867

Regards,

Srinivas

 

 

 

 

Regards,

Srinivas

@harsha99, thanks a lot for sharing the solution! After following all the steps, my report cannot refresh on Power BI Service (online) because it cannot find an appropriate data source in a gateway. According to

 

https://docs.microsoft.com/en-us/power-bi/service-gateway-enterprise-manage-scheduled-refresh#using-...

 

the trick is that the server and database names must match in the local ODBC used in Power BI desktop and the one used to configure the data source available through the gateway. Still, it fails to find a match.

 

Checklist of what I have set up:

* Windows 64-bit machine in EC2 with:

** Amazon's ODBC driver installed.

** System DSN successfully connected to our Redshift cluster.

** Public IP with no inbound/outbound networking restrictions (this is temporary for testing!).

** Power BI on-premise gateway setup (not personal).

* Local machine with:

** Amazon's ODBC driver installed.

** System DSN successfully connected to our Redshift cluster.

** Power BI desktop.

** Power BI report using ODBC to successfully retrieve data from Redshift.

* Power BI service (online) with:

** On-premise gateway set up with ODBC data source successfully connected.

** Power BI report published from desktop.

 

Are there any specific details of your setup that you think could make it work?

 

Thanks a lot!

Check if driver in ODBC strings is same on service and desktop, driver on my config is "Amazon Redshift (x64)"

 

On desktop when creating a Data source(DSN) instead of selecting "Amazon RedShift ODBC DSN", select none and configure the Connection String with the same Connection String you see on the power BI online service.

 

Hope that helps. if it works feel free to update as comment to my answer here https://stackoverflow.com/questions/48773043/connecting-to-private-amazon-redshift-from-powerbi-serv....

Thanks so much for @harsha99 for hinting the answer! Actually, the DSN cannot be configured via a connection string since only a form is provided. However, the datasource on Power BI service uses a connection string to be set up. The difference was that the connection string generated by the personal-mode gateway not only contained server and database but also the DSN name! This seemingly redundant configuration did the trick! So, to recap: when defining the datasource in Power BI service, make sure its connection string looks like:

driver={Amazon Redshift (x64)};server=<server>;dsn=<data source name>

where the name of the data source must match that of the ODBC system data source defined in the Windows machine where the On-premise Data Gateway is installed.

Anonymous
Not applicable

Hi,

 

we followed the same steps as given above, but were not able to connect powerbi desktop to private redshift.

error:- connection timeout.

please suggest  on how to connect to AWS redhsift inside a VPC from power bi desktop tool. we used odbc connector (get data -> odbc)  to enter redshift connection details.

As redhsift is inside a private n/w, will odbc connector work? do we need to use custom connectors?


@Facundo_Affectv wrote:

Thanks so much for @harsha99 for hinting the answer! Actually, the DSN cannot be configured via a connection string since only a form is provided. However, the datasource on Power BI service uses a connection string to be set up. The difference was that the connection string generated by the personal-mode gateway not only contained server and database but also the DSN name! This seemingly redundant configuration did the trick! So, to recap: when defining the datasource in Power BI service, make sure its connection string looks like:

driver={Amazon Redshift (x64)};server=<server>;dsn=<data source name>

where the name of the data source must match that of the ODBC system data source defined in the Windows machine where the On-premise Data Gateway is installed.



@Facundo_Affectv wrote:

Thanks so much for @harsha99 for hinting the answer! Actually, the DSN cannot be configured via a connection string since only a form is provided. However, the datasource on Power BI service uses a connection string to be set up. The difference was that the connection string generated by the personal-mode gateway not only contained server and database but also the DSN name! This seemingly redundant configuration did the trick! So, to recap: when defining the datasource in Power BI service, make sure its connection string looks like:

driver={Amazon Redshift (x64)};server=<server>;dsn=<data source name>

where the name of the data source must match that of the ODBC system data source defined in the Windows machine where the On-premise Data Gateway is installed.


@harsha99 Replicating those steps for the personal-mode gateway works (running on the same machine where the enterprise runs), but the enterprise doesn't.

Hi,

 

Dashboards/Reports can be published to the Power BI Cloud which uses a connection to the On-Premise gateway to get data.  However, what if the report has on-presmises and Reshift data sources - how does the Power BI Cloud connect to AWS Redshift via the cloud?

 

Summary:

1. The on-premise data can be accessed from the cloud via the on-premise gateway

2. How will the Redshift data be accessble from the power bi cloud service?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors