Introduction
What is Superset
- Superset is a Data Visualization tool which is cloud-native, highly available and scalable as it works very well with containers.
- You can easily install it on a Kubernetes cluster using helm charts and then customize it to your needs
- It was developed at Airbnb by the creator of Airflow, Maxime Beauchemin and became a top level project in Apache foundation in 2021.
– Significant contributions were made by Lyft and Dropbox.
– It is also available as a managed service by Maxime’s company Preset - Superset is currently run at scale at many companies. For example, Superset is run in Airbnb’s production environment inside Kubernetes and serves 600+ daily active users viewing over 100K charts a day.
What this article will cover
This article will cover steps to locally install superset on Windows with DB drivers for MS SQL Server, Dremio, MySQL and Pyodbc. While performing this installation I had to dig a lot to get it working. The end result is the article below especially the Dockerfile below which will be very helpful for you.
Prerequisites
Docker Desktop
If you do not already have Docker Desktop installed please follow the steps mentioned in the link below.
https://docs.docker.com/desktop/windows/install/
Once everything is installed correctly you should see a Docker icon in your system tray
MSSQL Server
This is only needed if you are using local version of MSSQL server.
For installing MSSQL Server Download the server installation from below link
https://go.microsoft.com/fwlink/?linkid=866662
Once server is installed install the SQL Server Management Studio (SSMS) from below link from Microsoft
Once SSMS is installed, connect to the server using localhost and Windows Authentication mode using SSMS.
After you are connected, create a new user login and call it anything you want. I named it superset. Important thing to note is to enable “SQL Server and Windows Authentication mode” on server properties (as shown below). You can skip all these steps if you already have a SQL Server available anywhere on your network.
Dremio
To install Dremio you will need to download the docker image from docker hub
https://hub.docker.com/r/dremio/dremio-oss/
The way you can do is from your command prompt on Windows.
Step1:- Open Command Prompt
Step2:- Run the following command
docker pull dremio/dremio-oss
Step3:- Then run below docker command to start the container
docker run -p 9047:9047 -p 31010:31010 -p 45678:45678 dremio/dremio-oss
Once the command completes check your Docker containers list as shown in Docker step above (in Docker Desktop section), you should see a Dremio container running.
Step4:- Then go to the page localhost:9047 on your web browser and for the first time it will ask you to setup a new user account. Create the account by following the instructions on the screen and keep a note of the username and password. You will need them while connecting from Superset to Dremio
Step5:- Once you are logged in to Dremio, upload a sample csv file to Dremio with data which you would like to visualize on superset. As you can see in below screenshot I have uploaded WheatWorldPSD.csv file using the button highlighted in red
Now you are all set to install Superset along with necessary drivers
Custom Dockerfile for Superset
I have spent lots of hours in getting this right so that everything works. Please use below Dockerfile I used to install custom DB drivers for MSSQL, MYSQL and DREMIO.
FROM apache/superset # Switching to root to install the required packages USER root # install FreeTDS and dependencies RUN apt-get update \ && apt-get install unixodbc -y \ && apt-get install unixodbc-dev -y \ && apt-get install freetds-dev -y \ && apt-get install freetds-bin -y \ && apt-get install tdsodbc -y \ && apt-get install --reinstall build-essential -y \ && apt-get install rpm2cpio -y \ && apt-get install cpio -y # populate "ocbcinst.ini" RUN echo "[FreeTDS]\n\ Description = FreeTDS unixODBC Driver\n\ Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so\n\ Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so" >> /etc/odbcinst.ini # Install Dremio ODBC driver RUN pip install pyodbc RUN wget https://download.dremio.com/odbc-driver/1.5.4.1002/dremio-odbc-1.5.4.1002-1.x86_64.rpm RUN rpm2cpio dremio-odbc-1.5.4.1002-1.x86_64.rpm | cpio -idmv RUN mv opt/dremio-odbc /opt/dremio-odbc # Example: installing the MySQL driver to connect to the metadata database # if you prefer Postgres, you may want to use `psycopg2-binary` instead RUN pip install mysqlclient # Example: installing a driver to connect to Redshift # Find which driver you need based on the analytics database # you want to connect to here: # https://superset.apache.org/installation.html#database-dependencies #RUN pip install pydobc RUN pip install sqlalchemy-redshift RUN pip install sqlalchemy-dremio RUN pip install pymssql # Switching back to using the `superset` user USER superset
How to run the above docker file
Open the above dockerfile in Visual Studio code
Build the image (may take some time) – remember the dot “.” is important at the end of command. Do not miss it!. Run below command in the terminal window of your VS code
docker build -t custom_superset .
Run Docker container
docker run -d -p 8080:8088 --name superset custom_superset
Create Superset admin account
docker exec -it superset superset fab create-admin --username admin --firstname Superset --lastname Admin --email admin@superset.com --password admin
Migrate local DB to latest
docker exec -it superset superset db upgrade
Load Examples
docker exec -it superset superset load_examples
Initialize Superset
docker exec -it superset superset init
Then login at below URL with username/password as admin/admin
http://localhost:8080/login/ — u/p: [admin/admin]
There you have it Superset running as a docker container on your local machine 🙂 You are wlecome!
Next time you want to run it just start it from the Container list window on your Docker Desktop. All your connectivity settings will remain intact.
Connectivity to MS SQL server
Step1: Goto Data->Databases dropdown and select Add Database as shown in the screenshot below
Step2:- Choose Microsoft SQL Server from the Supported Databases dropdown in the popup as shown below. You can select Other if you don’t see it as an option.
Step3:- On the next screen give a meaningful name to your connection and enter the below string as SQLALCHEMY URI. Replace the username and password which is superset and superset90 in my case.
Also as superset is running under docker container and my MS SQL Server is running on local host, I needed to use the host.docker.internal as hostname. You can replace it with your SQL server hostname.
Connection String
mssql+pymssql://superset:superset90@host.docker.internal:1433/RawData/
Connectivity to Dremio
Steps for Dremio are similar to as shown above for MSSQL server. Just use the below connection string.
Connection String
dremio://raghav.ds:dremio9003@172.17.0.3:31010/dremio?SSL=1
Again, here also replace your own username and password. Mine were raghav.ds and dremio9003 respectively.
To replace the hostname (172.17.0.33) get the bridge network IP address that Docker has assigned to your Dremio container using the below command
docker network inspect bridge
Conclusion
I had to research and try a lot of things to get everything working. I am sharing the result of my hard work here. If you like this please do share it with people who need it. Also let me know if you have any questions.
Great! It’s work fine.
thank you
I’m sorry, the connection with sql server doesn’t work fine.
After test connection:
ERROR: (pymssql._pymssl.OperationalError) (18456, b’DB-Lib error message 200009, severity 9:\Unable to connect: Adaptive Server is unavailable or does not exist (localhost)\Nel-Lib Error during Connection refused (111)…