Creating Charts using matplotlib in Python [Hands-on]

Data Storytelling is a very important branch of Data Science. Your world may not be as fond of numbers as you are hence it's very important to show them your results in the language that they understand.  Hence for any language to be a member of the data science world, it's not only their data processing capabilities should be great but also the data visualizations should be exceptional and hence Python with packages like matplotlib is capable of competing in the world of R.

So let's try to represent the data of our previous post in terms of graphs/charts.

Problem:

Draw a bar graph with a dictionary counts that we built in our previous blogpost

Takeaways:

  • Basics of matplotlib

Approach:

As we do for every new package, the first job is to import matplotlib package.

import matplotlib.pyplot as plt

Now let's draw a bar graph with the values (vote) of the dictionary counts 


plt.bar(range(len(counts)), counts.values(), align='center')
Our graph is ready now but it's kind of naked (without labels ;) ) but let's show it!

plt.show()

But a graph with no labels would make no sense to anyone hence it's our duty to make sure that the graph's x-axis and y-axis are labelled correctly. Let's add them too!

plt.ylabel(s = "Votes")
plt.xticks(range(len(counts)), counts.keys(),rotation=90)

And here's how the bar graph looks: beautiful isn't?

Download the source code here.
Sunday, August 9, 2015
Posted by Netbloggy

Solving Voting Problem from OpenTechSchool with Python [Hands-on]

The best way to learn any programming language is to solve problems with it. While programming documentations can teach you syntax, you can get closer to the language only when you get hands-on with the code. So let's get started with our first problem in this python journey: Voting Problem

Problem:

We have 300 lines of survey data in the file radishsurvey.txt. Each line consists of a name, a hyphen, then a radish variety and so on. Our objective is to find answers for the following:


  • What's the most popular radish variety?
  • What are the least popular?
  • Did anyone vote twice?

Takeaways:

In our attempt to solve this problem, we'll come across the following concepts of python:
  • Reading & Cleaning a text file
  • Basic String Operations
  • Traversing a Dictionary & List
  • Iterative Looping and Conditional Looping 
  • Defining and Calling a function
Approach:

We can read the file radishsurvey.txt and put its contents in a file object to traverse it. Like this:


radish_contents = open("radishsurvey.txt")
for line in radish_contents:

 Instead we can directly use the file open() function in our iteration to reduce one step. But before that we are creating an empty dictionary counts to store the vote counts and an empty list voted to track the duplicate voters. Comments in the below code explain the purpose of every step.


counts = {}
voted = []
for line in open("radishsurvey.txt"):
    
    line1 = line.strip()
    #print line #remove this comment to see how the line would be printed without strip()
    name, vote = line1.split(" - ")
    vote = vote.strip().capitalize() #just to make the 'vote' elements in proper case

    vote = vote.replace("  "," ") #data cleaning: replacing two white spaces with one

    if name in voted:
        print name, "has already voted" #printing the voter's name who voted again                 continue  #skip their vote and process the next line
    voted.append(name) #for first time voters: adding their name to voted list

    if vote not in counts:
        # First vote for this variety - make a new entry in dictionary and set value to 1
        counts[vote] = 1
    else:
        # Increment the vote count as the entry is already present in the dictionary
        counts[vote] = counts[vote] + 1

We have successfully built a dictionary with Radish variety as Key and Vote count as its Value and also we've handled the most important test case of printing the duplicate voters and disregarding their vote.


for item in counts:
    print item, counts[item] 

While this code can give us all the details that we wanted, we still manually need to go through every line to see the most voted and least voted variety. And we, programmers who are meant to be lazy, would want the program itself to tell us that too.  Here's the code:


def find_winner(counts):
    winner = ""
    pre_vote = 0
    for vote in counts:
        if counts[vote] >= pre_vote:
            
            winner = vote
            pre_vote = counts[vote]
            
            
    return winner, pre_vote
    
def find_loser(counts):
    loser, pre_vote = find_winner(counts) #calling a function inside another fn.
    for vote in counts:
        if counts[vote] < pre_vote:
            
            loser = vote
            pre_vote = counts[vote]
            
    return loser, pre_vote

Here's the output after executing the code in python 2:

Phoebe Barwell has already voted
Procopio Zito has already voted
White icicle 64
Snow belle 63
Champion 76
Cherry belle 58
French breakfast 72
Daikon 63
Bunny tail 72
Sicily giant 57
Red king 56
Plum purple 56
April cross 72
And the winner is Mr. Champion with 76 votes
Sorry, the loser is Mr. Red king with 56 votes

Our objectives are met and hope you've learnt something from this blogpost.
Download the entire python code here.

How to install Python on your Computer? [Tutorial]

Data science is all about making sense of the data that we have. And for that purposes, two widely used languages are Python and R. So let's start with Python!



As every other high level programming language, your machine needs an interpreter to read the code (.py) and understand it. And for us to code (to create the .py file) any text editor would do the job but Python being an indentation-sensitive language, it's better to use some editor that would take care of the indentation part and also highlighting the built-in keywords so that the interface would look great. A software that does this job is called an IDE (integrated development environment) and for python there are many such IDEs.

A typical programmer being lazier than an average human being should always look for one package that has all these - an interpreter, an IDE and much more - so just one click should install everything related to python on your machine and there's such an application package called "Anaconda".




Whether you are running Windows, Linux or Macintosh - Jump in here and download your appropriate package!

Double-click the downloaded Anaconda setup and proceed with installation. You are done once the installation is finished.

Few things to be noted:

1. Anaconda comes with a huge set of Python packages which you primarily require for your data analysis and scientific calculations.
2. Windows & Linux users - You don't need to set the environment path to access python from any directory but Mac users might need to set the path (export PATH=~/anaconda/bin:$PATH)
3. Anaconda has a huge list of FAQs so check them if you have any trouble in getting this work.
4. After installation just open your command prompt or terminal and type spyder and if the spyder IDE opens, you're perfectly done with installation.

Happy pythoning!!!

Sunday, July 19, 2015
Posted by Netbloggy

Hello World!

We, as an average internet user consume a lot of data from the web but the data that we (knowingly) publish online can be relatively NEGLIGIBLE except our daily Facebook posts.  But just imagine that if we are in a universe where everyone is like us - just not caring about online contribution but just consuming data - at some point of time there wouldn't be any new data for us to consume. 

Hence to deviate from the mass crowd and to become an online contributor, here's an attempt (Pushed by my professor from Praxis Business School).

Hello World!

Monday, July 6, 2015
Posted by Netbloggy

Popular Post

Blogger templates

Total Pageviews

Powered by Blogger.

- Copyright © nulldata -Metrominimalist- Powered by Blogger - Designed by Johanes Djogan -