Is a direct connection to the Insight backend right for you?
You may have existing reporting/visualization/statistical analysis tools that you wish to use to track, analyze, or report on your OpenClinica data. The most common method for doing this is using downloadable data extracts from the OpenClinica user interface. However several other options exist, including using OpenClinicas API, using the OpenClinica Insight reporting tools, using Insights API, or by establishing a direct read-only connection to the Insight data warehouse.
This guide provides an overview of accessing data through a direct read-only connection to the Insight data warehouse.
For existing institutional toolchains, such as SAS, Qlik, Tableau, SSRS, Shiny, Jupyter, Stata, R, Python, Power BI, etc, the customer can establish a direct, live, readonly connection with the Insight database. This method can also be used to incorporate your OpenClinica data into other databases via Postgres Foreign Data Wrappers (FDWs, or linked tables).
Since the Insight back-end is a PostgreSQL database, allowing connections is a matter of setting up a secure, read-only database connection.
Pros:
- Works where SSH does: Linux, Windows (PuTTY), etc.
- Use the tool of your choice, so long as it supports remote connections
- Live access to data, not snapshots as in data extracts (can be pro & con)
Cons:
- Mostly static data permissions (schema-level)
- User management is separate from the OpenClinica and Insight web interfaces
- OpenClinica team will support setup and maintenance of your connection however assistance with connectivity and use of your particular tool (such as SAS or Tableau) is best-effort. We are not able to support the wide variety of possible clients (SAS, Tableau, Power BI, etc) and cannot guarantee end-to-end connectivity, functionality, or level of service.
There are three main aspects to the set up:
- Securing the connection itself
- Authenticating the user that is connecting, and
- Defining what the user is authorised to do.
We do this with a secure, read-only, direct interface to the Insight Data Warehouse, via an SSH tunnel. It will require the involvement of your internal IT team
Technical Information
Connections to the Insight back-end (PostgreSQL) database are secured by using a SSH tunnel via a jump host.
A “jump host” means that your Insight server is not directly reachable from the Internet, so instead connections are made via an intermediate server that is reachable from the Internet but from which connections are allowed to the Insight server.
A “SSH tunnel” means that the nature of the connection to Insight will be via a SSH (secure shell) session between your computer (or server) and the jump host. This SSH session is then configured to forward database connection requests to a certain local port (such as 65432) to the Insight server’s and database port (such as 5432). Database traffic forwarded in this way is secured.
In order to configure back-end connections on the OpenClinica side, 3 items are required:
- The IP addresses from which you will connect. This could be your computer IP, or a server IP. This is used to restrict access to the jump host to only those nominated IP addresses, for additional security.
- The user names and a “PEM” format SSH public key (see below) for the user(s) that will connect. If you are setting this up for a server then there could be just one “service account” type user, or there could be multiple individual users.
- For each user, which study environment(s) / schemata that they should have access to. The Insight “databases” shown in the Metabase front-end are actually database schemata. The default is access to all schemata, but it is possible to assign read access to one schema or many.
The exact configuration of back-end connections on your side depends on the intended setup. A SSH client will be required on your computer or server. On Linux, SSH is installed by default (ssh). On Windows, there is a SSH tool called “PuTTY” which has a user friendly interface. Alternatively on Windows, “Git for Windows” (Git is a change tracking tool) comes with a Windows version of Linux’s SSH tool.
The following links describe how to create SSH keys (for step 2):
- Linux/Mac: https://www.digitalocean.com/docs/droplets/how-to/add-ssh-keys/create-with-openssh/
- Windows: https://www.digitalocean.com/docs/droplets/how-to/add-ssh-keys/create-with-putty/
Please be sure to use a passphrase for your keys so that they cannot be used if they fall out of your possession. Once you have generated the key(s) please provide the public key(s), will be the file(s) whose name ends in “.pub”. Please do not share the private key(s).
The steps to set up the SSH tunnel is specific to the tool being used, so these can be provided when you let us know what you will be using. For either PuTTY or ssh, it is reasonably straightforward.
Once the SSH tunnel is in place, it will be possible to connect to the Insight back-end PostgreSQL server. If using ODBC, the “psqlODBC” driver will need to be installed on the connecting computer/server (https://odbc.postgresql.org/). This driver allows applications to talk to PostgreSQL in the ODBC protocol. Then, either a DSN or connection string can be set up in the desired application. We can provide examples of these with your settings.
Setup Procedure
Share the overview; ensure the customer understands the solution and that its right for them
Share the overview with the customer and speak with the customer to ensure they understand what theyll get out of setting up the connection, and what the pros and cons are. Identify the stakeholders who need to be involved (customers IT and end user such as a statistician or data management lead, at minimum).
Secure the connection
Prerequisite: Completed OC Insight installation in AWS
- Collect external IP addresses or address range from customers IT.
- Configure the Jump host with access allowed only from customer-specified IP’s on port 22 using SSH-RSA keys for authentication
- Set up AWS Security Group to allow connections on to the Insight PostgreSQL instance on port 5432 from the jump host
- Configure local firewall on Insight host has been configured to accept connections from the internal IP of the jump host on port 5432
- Configure PostgreSQL on the Insight host to allow connections from the internal IP of the jump host
- Configure the “external_service_account” for the customer and added to LastPass, to be provided to the customer
Authenticate users that are connecting
On customers machine:
- Set up pgadmin
- Set up port forwarding and SSH tunnel
- Create general SSH Key
- Configure pgadmin to connect
- Provide assistance configuring SAS or other client software to connect
Define what users are authorized to do
- Local firewall on Insight host has been configured to accept connections from the internal IP of the jump host on port 5432
- PostgreSQL on the Insight host has been configured to allow connections from the internal IP of the jump host