What is Databricks Databricks is an American Software company founded by the creators of Apache Spark. It provides a web based platform of the same…
Category: Data Engineering
Introduction Redis is an open source in-memory data structure store. It is a short for Remote Dictionary Server(REDIS). It is used as a distributed in-memory…
In the last post I described how to create a GIS shape file for the region of interest for which you need to extract the…
Creating Shape files In the first part we discussed how to download the netcdf files containing the weather data from NOAA’s website using Python. In…
NOAA is a US government agency that forecasts weather and monitors oceanic and atmospheric conditions. It is one of the biggest weather agencies in the…
This article covers two of the most important concepts related to execution of code in Apache Spark. It is crucial for your understanding of Spark…
Introduction to Spark Spark is a unified engine for distributed data processing. It supports both on premise and cloud installation. Applications written in Spark can…
Introduction What is Superset Superset is a Data Visualization tool which is cloud-native, highly available and scalable as it works very well with containers. You…
Introduction Fixed width files, which do not have any column delimiters are common in financial industry especially with ETL extracting data from mainframe systems. In…
Scraping product prices data from Amazon is very helpful if you are building a price comparison utility or if you want to be alerted when…