- Back to Home »
- csv , Hands-on , histogram , matplotlib , Python »
- Creating Histograms using matplotlib of Python [Hands-on]
Posted by : Netbloggy
Sunday, August 9, 2015
Histogram is the best way to display frequency of a data and here we are to create one. So far we've dealt with text files and now it's time to show some progress and work with some real-world data hence this time, it's going to be a csv (comma-separated value) file from openflights.org.
Unlike text files, to process csv files, we need to import a package called csv . Also going forward in the program we need to calculate geo distance which is quite different from our normal distance calculation as the former deals with longitudes and latitudes so we've to download the python program geo_distance and import the function geo_distance into our program.
import matplotlib.pyplot as plt import csv import geo_distance #for calculating dist b/w lats. and longs.
Let's dive deeper into the code. As you see below, we are working with two different input dataset 1. airports.dat to get airport details and 2. routes.dat to get route details. And now we've to calculate geo_distance from both those data and record it in a list distance[]
d = open("airports.dat.txt") latitudes = {} longitudes = {} distances = [] for row in csv.reader(d): airport_id = row[0] latitudes[airport_id] = float(row[6]) longitudes[airport_id] = float(row[7]) f = open("routes.dat") for row in csv.reader(f): source_airport = row[3] dest_airport = row[5] if source_airport in latitudes and dest_airport in latitudes: source_lat = latitudes[source_airport] source_long = longitudes[source_airport] dest_lat = latitudes[dest_airport] dest_long = longitudes[dest_airport] distances.append(geo_distance.distance(source_lat,source_long,dest_lat,dest_long))
Now our data is ready and it's time for some storytelling. Let's create a histogram with hist().
plt.hist(distances, 100, facecolor='b') plt.xlabel("Distance (km)") plt.ylabel("Number of flights")
nice post
ReplyDelete