There is a huge amount of data available on the web and from other sources (see Quandl and others). The problem is getting to the data you need to get a job done! Here we'll take a step of filtering a set of data to remove errors and find specific values. In later labs, we'll learn to "parse" large, multi-column, datasets.
You have been selected by the American Racing Pigeon Union to create a program to flag errors in their data and find the best pigeons. The numbers are in "Unirates" which is the average Universal Performance Rating (UPR). The UPR is the position of the bird in each race divided by the number of birds (e.g. 1/100 would mean the bird was first out of 100 birds). The lower the rate, the better the bird.
Create a script that will print the number and the following for any series of numbers:
Test your code with a number of different lists (at least 3 and leave them in your code). You can download data from the American Racing Pigeon Union web site but we don't know how to parse it with Python. You can download it to a CSV file and load it into Excel or copy and paste from the web site into Excel. Then, copy the data into Python and format it as a list.
Make sure to test your code (and turn it in) with the following series of values:
-10, 0, 23, 435, -321.0, 0.1, -00.2, 3.1415, 123456789
Here is a list the population of the US from 2010 through 2015 from the US Cenus (I did remove the commas and then replace the spaces with commas to turn it into a list):
309346863, 311718857,314102623, 316427395, 318907401, 321418820
Write another script to filter data for errors or interesting statistics. There are some ideas at the 7 Datasets You've Likely Never Seen Before.
Jim, Next year:
© Copyright 2018 HSU - All rights reserved.