Tag : python

In this post “Python use case – Save each worksheet as a separate excel workbook“, we are going to learn that how we can create a separate workbook for each worksheet of a given excel file. We will be copying data, values, formatting and all other settings of the sheet in the newly created workbook. Assume that we have an excel workbook having five sheets named “Sheet1“, “Sheet2“, “Sheet3“, “Sheet4“, and “Sheet5” which is as this.

Now, we want to create 5 different excel workbooks (one for each worksheet).

We are going to use Excel application to copy data, values, formatting and other settings of each sheet as a new workbook in a python script. Below is the python code.

To execute the above python script, we can call this .py file using command prompt window as this.… More

In previous post, we created our first Machine Learning model using Logistic Regression to solve a classification problem. We used “Wisconsin Breast Cancer dataset” for demonstration purpose. Now, in this post “Building Decision Tree model in python from scratch – Step by step”, we will be using IRIS dataset which is a standard dataset that comes with Scikit-learn library. Let’s have a quick look at IRIS dataset.

The IRIS dataset

The IRIS dataset is a multi-class classification dataset introduced by British statistician and biologist Ronald Fisher in 1936. This dataset has 150 observations which consists 50 samples of each of three species of Iris flower which are “setosa“, “versicolor” or “virginica“. It is a standard, cleansed and preprocessed multivariate dataset which comes preloaded with Scikit-learn library. Each sample has four input features which are:

When we start working on a Machine Learning/Data Science project, first we need to perform some data cleaning and data transformation to get a tidy dataset. Then, we need to perform some EDA(Exploratory Data Analysis) to find trends, patterns, and outliers in the given data. Once, we have a machine interpretable data in place, we choose an algorithm and train the model. Then, we evaluate it on the test data. Next, we can tune the hyper parameters of the model and retrain it to get a robust model. Once the model performance is acceptable, we deploy it to make predictions. Typically, we follow these steps in a Machine Learning model creation:

In this post “Building first Machine Learning model using Logistic Regression in Python“, we are going to create our first machine learning predictive model in a step by step way. We will be using scikit-learn library … More

In the previous post, “Tidy Data in Python – First Step in Data Science and Machine Learning”, we discussed the importance of the tidy data and its principles. In a Machine Learning project, once we have a tidy dataset in place, it is always recommended to perform EDA (Exploratory Data Analysis) on the underlying data before fitting it into a Machine Learning model. Let’s start understanding the importance of EDA and some basic EDA techniques which are very useful.

What is Exploratory Data Analysis (EDA)

Exploratory Data Analysis or EDA, is the process of organizing, plotting and summarizing the data to find trends, patterns, and outliers using statistical and visual methods. It takes input data from a tabular format and represents it in a graphical format which makes it more human interpretable. It is an important step in a Machine Learning/Data Science project which should be performed before … More

Resampling time series data in SQL Server using Python’s pandas library

In this post, we are going to learn how we can use the power of Python in SQL Server 2017 to resample time series data using Python’s pandas library. Sometimes, we get the sample data (observations) at a different frequency (higher or lower) than the required frequency level. In such kind of scenarios, we need to modify the frequency of the given samples as per the frequency of the required outcome. Modifying the frequency of time series data using T-SQL query becomes a tedious task especially when we need to perform upsampling as we need to generate more rows than what we have in the sample dataset. The Python’s pandas module has in-built capabilities for frequency conversion. With the help of pandasresample method, we can increase or decrease the time series observation frequencies with only few lines of … More

If we need to import data from an excel file into SQL Server, we can use these methods:

SQL Server Import Export Wizard

Create an SSIS package to read excel file and load data into a SQL Server table

Use T-SQL OPENROWSET query

Use the read_excel method of Python’s pandas library (Only available in SQL Server 2017 onwards)

In this post “Python use case – Import data from excel to sql server table – SQL Server 2017”, we are going to learn that how we can use the power of Python in SQL Server 2017 to read a given excel file in a SQL table directly. With the integration of Python in SQL Server 2017, we can use the pandas read_excel method to read a given excel file with lots of customizations in SQL Server.

Import zipped CSV file without unzipping it in SSIS using SQL Server 2017

SQL Server Integration Services (SSIS) is one of the most popular ETL tools. It has many built-in components which can be used in order to automate the enterprise ETL(Extract, Transform, and Load). Also, if we need a customized component which is not available in SSIS, we can simply create it by writing our own piece of code in C# using Script Task or Script Component.

In this post, we are going to explore that how we can read and load a zipped CSV file in SQL Server without unzipping it using SSIS along with SQL Server 2017. Reading a zipped file directly (without unzipping it) will save some time required in order to write the text file on the physical disk and then reading it from there. As of now, we don’t have any built-in component in … More

In this post, we are going to learn how we can leverage python in SQL server to generate comma separated values.

If we want to combine all values of a single column it is fairly easy as we can use COALESCE function to do that. Here is a reference to the already existing post. But have you ever thought what would happen if we needed a comma separated value in a column along with other columns? In that scenario, this approach would not work.

We can get comma separated values in a column along with other columns using FOR XML PATH query wrapped inside a sub-query, but there also we would need to take care of HTML encoded characters like < and >.

In this post, we are going to learn how we can leverage the power of Python’s pandas module in SQL Server 2017. pandas is an open source Python library providing data frame as data structure similar to the SQL table with the vectorized operation support for high performance. To know more about pandas, you can click here.

Let’s discuss the problem we face while using the SQL UNPIVOT clause especially when we have a large number of columns. We can use UNPIVOT clause in SQL Server to convert the columns as row values and normalize the output result set. To use the UNPIVOT command, we need to specify each column name as a fixed value while writing the T-SQL query. However, this becomes annoying if we need to specify a large number of columns in the UNPIVOT clause. Also, if the column names are not fixed (dynamic in nature), … More

In this post “Connecting Python 3 to SQL Server 2017 using pyodbc”, we are going to learn that how we can connect Python 3 to SQL Server 2017 to execute SQL queries. We can change the settings accordingly to connect to other versions of SQL Server also. If you are interested to know more about Python and why you should learn it, visit our post “Why Python and how to use it in SQL Server 2017“.

What is pyodbc?

pyodbc is an open source DB API 2 Python module. It provides a convenient interface to connect a database which accepts an ODBC connection. In order to use pyodbc module, firstly, we need to install it. Click here for more information on pyodbc.

pip install pyodbc module

We can use pip install command to install the pyodbc module in Python 3 on a Windows machine. Before executing the … More