Posted by : Netbloggy Sunday, August 9, 2015

Histogram is the best way to display frequency of a data and here we are to create one. So far we've dealt with text files and now it's time to show some progress and work with some real-world data hence this time, it's going to be a csv (comma-separated value) file from openflights.org.

Unlike text files, to process csv files, we need to import a package called csv . Also going forward in the program we need to calculate geo distance which is quite different from our normal distance calculation as the former deals with longitudes and latitudes so we've to download the python program geo_distance and import the function geo_distance into our program.


import matplotlib.pyplot as plt
import csv
import geo_distance #for calculating dist b/w lats. and longs.

Let's dive deeper into the code. As you see below, we are working with two different input dataset 1. airports.dat to get airport details and 2. routes.dat to get route details. And now we've to calculate geo_distance from both those data and record it in a list distance[]

d = open("airports.dat.txt")
latitudes = {}
longitudes = {}
distances = []
for row in csv.reader(d):
    airport_id = row[0]
    latitudes[airport_id] = float(row[6])
    longitudes[airport_id] = float(row[7])

f = open("routes.dat")
for row in csv.reader(f):
    source_airport = row[3]
    dest_airport = row[5]
    if source_airport in latitudes and dest_airport in latitudes:
        source_lat = latitudes[source_airport]
        source_long = longitudes[source_airport]
        dest_lat = latitudes[dest_airport]
        dest_long = longitudes[dest_airport]
        distances.append(geo_distance.distance(source_lat,source_long,dest_lat,dest_long))

Now our data is ready and it's time for some storytelling. Let's create a histogram with hist(). 
plt.hist(distances, 100, facecolor='b')
plt.xlabel("Distance (km)")
plt.ylabel("Number of flights")        
Once you execute the code, a beautiful bluish histogram appears. Here it is:
Download the source code here!

{ 1 comments... read them below or add one }

Popular Post

Blogger templates

Total Pageviews

Powered by Blogger.

- Copyright © nulldata -Metrominimalist- Powered by Blogger - Designed by Johanes Djogan -