Practical 1 - Number of Cyclists
How to Find Columns that are in Common Between Tables?
Use Set and set.intersection(set1, set2)
to do this. One use case is to vertically concatenate tables. Such as combine data from different years (2014, 2015, 2016) together.
Year-2014.csv
Date | Station Number | Number Cyclists |
---|---|---|
2014-01-10 | 897123 | 15 |
2014-03-09 | 362300 | 21 |
Year-2015.csv
Date | Station Number | Number Cyclists | Note |
---|---|---|---|
2015-01-10 | 897123 | 25 | good |
2015-03-09 | 362300 | 52 | nice measure |
all_columns = {"Date", "Station Number", "Number Cyclists", "Note" }
common_columns = all_columns.intersection(year_2014.columns.values,
year_2014.columns.values)
The above code will drop the Note
column.
info
Set intersection example:
first = {"c", "d", "e"}
second = {"c", "f", "g"}
all_columns = {*first, *second} # {'c', 'd', 'f', 'e', 'g'}
result = all_columns.intersection(first, second)
print(result) # 'c'
How to Convert Date String Into Datetime Object?
Use dataframe.apply(func, axis)
or pandas.to_datetime(data["Date"], format="%d/%m/%Y")
Date Information is Located in a Single Column
Date | Number Cyclists |
---|---|
2014-03-22 | 91 |
2014-03-23 | 21 |
data["Date"] = pandas.to_datetime(data["Date"], format="%Y-%m-%d")
Date Information is Spread Across Multiple Columns
Year | Month | Day | Number Cyclists |
---|---|---|---|
2014 | 03 | 22 | 91 |
2014 | 03 | 23 | 21 |
from datetime import datetime
# A new column called 'Date' will be created and its data type is datetime64
data["Date"] = data.apply(lambda row: datetime(row.Year, row.Month, row.Day), axis = 1)
How to Concatenate Tables?
There are two types of concatenation. Horizontal and Vertical.
Horizontal (left and right)
pandas.merge(left,right, on="Column Name")
Vertical (Stack)
pandas.concat([table1, table2, table3])
How to Filter Rows By Date?
For example, get the data between 2014~2018
# 2014-2018
data = data[ (2014 <= data["Date"].dt.year)
& (data["Date"].dt.year <= 2018)
]
warning
- & stands for
and
- Operands need to be surrounded by brackets.
References
Practical 1 link: https://github.com/xiaohai-huang/cab420-workspace/tree/master/work/week1/CAB420_Prac1