Data Science with Python in Visual Studio 2017 — Quick Guide

Nishān Wickramarathna
4 min readApr 30, 2018

--

We will see how to setup and start a small but advanced Regression analysis techniques from Visual Studio 2017 and how you can develop apps faster using templates with new Cookiecutter Explorer. Linear regression is perhaps one of the most well known and well understood algorithms in statistics and machine learning. You can learn more about linear regression from here. We will do a quick data science example on how use open data to predict future stock prices and evaluate different regression models using the environments and samples that we got as part of the data science workload.

First, we need python support for visual studio using the data science workloads. Open up the visual studio installer from start menu or going to the control panel and look for visual studio and the clicking ‘Change’ from the top.

1

Once it opens up select either both Python development and Data science and analytical applications workloads together or simply just the Data science and analytical applications workloads; but make sure you select Python language support and Anaconda3 work-space from the left panel. And then click Modify. The Anaconda Python environment comes prefixed with over a 100 Python packages for scientific computing and data science.

2

After installation, open up visual studio and go to Tools -> Python and select Cookiecutter Explorer.

3

In the search bar of the Cookiecutter Explorer, search for ‘sklearn-regression’. Note that you need to be connected to the internet to download templates. You will get a search result under Github as ‘Microsoft/python-sklearn-regression-cookiecutter’. Select it and click next.

4

The template will be cloned successfully and here you can specify a location to where your project will be created. After that click on ‘Create and Open Project’.

5

After project creation, go to Solution Explorer. Under the project ‘regression’ right click on the node ‘Python Environments’ and click on ‘Add/Remove Python Environments…’

6

If you installed them correctly, your Python environments will show up here. Select ‘Anaconda 5.x’ and click ‘OK’.

7

You will notice that it will show up in your Solution Explorer.

8

Let’s look at the code. We are using four Python packages for each step of our algorithm.

  1. Download a data set (using pandas package)
  2. Process the numeric data (using numpy package)
  3. Train and evaluate learners (using scikit-learn package)
  4. Plot and compare results (using matplotlib package)

First, we download the data and read it in a Pandas data frame.

9

Next we normalize it, and split it in to two data sets, one being training set so we can train our models and test the predicted values with the entire set.

10

Then, using scikt-learn we train 3 models.

  1. Radial basis function
  2. Linear Kernal
  3. Polynomial Kernal
11

Then we create a plot comparing multiple learners via matplotlib.

12

Let’s Start the project and see the results.

1
14

Here, the R2 is representing the Coefficient of Determination, indicates how accurate the model was at predicting values. The closer this value is to 1, the better the model. From 3 models here, we can see that RBF model has the highest R2 value. So, we can conclude that it is the best model to predict this type of values/ to use in a similar context.

You can get the full source code from this repository. https://github.com/Microsoft/python-sklearn-regression-cookiecutter

--

--

No responses yet