Data Analysis – Finding the mean, mode and median for data in a CSV file

In this post we will be trying to find the mean, mode and median for data located within a CSV spreadsheet. I anticipate getting into machine learning and what the Python language has to offer in this realm, I have also heard a lot about the use of R in data analysis and so I may even give this a go too. However – to start it all of we will begin with the basics.

Mean – This is the same as the average value of a data set, to calculate this you would add up all the numbers within the set and then divide this total by the number of numbers in the data set.

Mode – This is the number(s) that appears most frequently within a data set.

Median – This is the number centred in the middle of a data set, to find this you would first sort the data set from smallest to largest and then locate the central number.

In this project we assume that your data is within a CSV spreadsheet, so we begin by retrieving this data. We then move on to converting the list of data in to a NumPy array, using NumPy again we can very quickly establish the mean and median. To find the mode however we are using SciPy.

Note that this example expects just a single column of data within the spreadsheet i.e. Column A to be populated with the relevant data.

Full source code can be found below:

import numpy as np
from numpy import genfromtxt
from scipy import stats

Data = genfromtxt('C:/Somewhere/Data.csv', delimiter=',')
DataArray = np.array(Data)

Mean = np.mean(Data)
Median = np.median(Data)
Mode = stats.mode(Data)


Leave a Reply