Skip to content

Estimating Demand and Supply at Different Time Intervals

Harsha Ramachandran edited this page Jul 14, 2022 · 3 revisions

As a starting point to solving the bike reposition problem, we want to understand how the supply and demand of bikes change at particular bike stations over time. The two relevant axis to analyze over are Temporal and Spatial.

We have two DataFrames, the bike station spatial data and trip data with start and end locations corresponding to bike stations.

Processing Demand/Supply

We need some sort of way of estimating what the demand and supply would be at different points. We will take two approaches to this.

  • Demand/Supply at end of 24hours
  • Demand/Supply at end of the week

We can estimate these values in the following way.

  • Set demand at each station to 0
  • Iterate from 1-24 (for 24 hours) or 0-6 for weeks.
    • At each iteration record a bike leaving a station as an increase in demand (+1) and a bike arriving the station as decrease in demand (-1).
    • Divide the demand by the number of days/weeks respectively

This will leave us with a DataFrame containing the the Bike Station's locations and their average demand at the end of the specified time period.

We calculate the days and weeks over the total period as follows

first = trips["Start Date"].min()
last = trips["Start Date"].max()
def diff(start, end):
    x = pd.to_datetime(end) - pd.to_datetime(start)
    return int(x / np.timedelta64(1, 'W'))
weeks = diff(first,last)
months = weeks /4
days = weeks * 7 

To analyze the data over time we a refactorable function with the following inputs

def processTripsOverTime(trips,bikeStations,av,range,t):

trips and bikeStations are the relevant DataFrames. Av is the number of weeks or days. Range is a range object for the iterations. t is a boolean specifying "W" for weeks and "D" for days.

For weeks we call the function like so

bikeStations = processOverTime.processTripsOverTime(trips,bikeStations,weeks,range(0,6),"W")

and for days like so

bikeStations = processOverTime.processTripsOverTime(trips,bikeStations,days,range(1,25),"D")

ProcessTripsOverTime Logic

The function iterates over the input range object like so

for i in range:

It then creates a time mask depending on whether it is a 24 hour or week calculation

 if t == "W":
            timeMaskEnd = (trips['End Date'].dt.day_of_week >= i) & (trips['End Date'].dt.day_of_week  < i +1)
            timeMaskStart = (trips['Start Date'].dt.day_of_week  >= i) & (trips['Start Date'].dt.day_of_week  < i +1)
 if t =="D":
            timeMaskEnd = (trips['End Date'].dt.hour >= i) & (trips['End Date'].dt.hour < i +1)
            timeMaskStart = (trips['Start Date'].dt.hour >= i) & (trips['Start Date'].dt.hour < i +1)

It then counts the number of trips starting and ending at that time frame and takes an average based on the number of weeks/days.

endFrame= trips[timeMaskEnd]
endFrameCounts = endFrame['EndStation Id'].value_counts().to_frame()
startFrame = trips[timeMaskStart]
startFrameCounts = startFrame['StartStation Id'].value_counts().to_frame()
endFrameCounts['EndStation Id'] = average(endFrameCounts['EndStation Id'],av )
startFrameCounts['StartStation Id'] = average(startFrameCounts['StartStation Id'],av)

Finally it increments the demand column in the Bike Stations DataFrame using a left merge.

endFrameCounts.index = endFrameCounts.index.map(str)
startFrameCounts.index = startFrameCounts.index.map(str)
endFrameCounts = endFrameCounts['EndStation Id']
startFrameCounts = startFrameCounts['StartStation Id']

bikeStations= pd.merge(bikeStations, endFrameCounts, how='left', left_on='id',right_index= True)   
bikeStations['EndStation Id'] = bikeStations['EndStation Id'].fillna(0)
bikeStations['demand'] -= bikeStations['EndStation Id']                                            
bikeStations = bikeStations.drop('EndStation Id',1)                                                
bikeStations= pd.merge(bikeStations, startFrameCounts, how='left', left_on='id',right_index= True)
bikeStations['StartStation Id'] = bikeStations['StartStation Id'].fillna(0)
bikeStations['demand'] += bikeStations['StartStation Id']                                         
bikeStations = bikeStations.drop('StartStation Id',1)

At the end of the all iterations we have an estimate of the demand and supply after a certain time frame at each bike station.