Data Analysis – Finding the mean, mode and median of combined CSV files

In this post we will be trying to find the mean, mode and median for data located within two CSV spreadsheets. The script will read from each of the spreadsheets and then concatenate the two together before finding the mean, mode and median.

Mean – This is the same as the average value of a data set, to calculate this you would add up all the numbers within the set and then divide this total by the number of numbers in the data set.

Mode – This is the number(s) that appears most frequently within a data set.

Median – This is the number centred in the middle of a data set, to find this you would first sort the data set from smallest to largest and then locate the central number.

In this project we assume that your data is within two CSV spreadsheets. We begin by retrieving this data from each and converting them in to there respective arrays. Once done we concatenate the two arrays and then using NumPy again we can very quickly establish the mean and median. To find the mode however we are using SciPy.

Note that this example expects just a single column of data within each spreadsheet i.e. Column A to be populated with the relevant data.

Full source code can be found below:

import numpy as np
from numpy import genfromtxt
import numpy as np
from numpy import genfromtxt
from scipy import stats

Data1 = genfromtxt('C:/Data1.csv', delimiter=',')
DataArray1 = np.array(Data1)

Data2 = genfromtxt('C://Data2.csv', delimiter=',')
DataArray2 = np.array(Data2)

Data = np.concatenate((Data1,Data2))

Mean = np.mean(Data)
Median = np.median(Data)
Mode = stats.mode(Data)

print(Mean,Median,Mode)

Leave a Reply