Performance enhancements of conditional logit #81
+539
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the past, I've used Pylogit (specifically the
MNL
) on a large dataset of 200mln rows. I have noticed two bottlenecks:weights_per_obs
is not always kept, causing a 200 mln x 200 mln dense numpy array to be created, see also issue Sparse to Dense #79.dh_dv
for a conditional logit represent an identity matrix but are coded as acsr_matrix
. This causes the calculationdh_dv.dot(design)
to be relatively slow even though its result is triviallydesign
.To remedy the first bottleneck, I used the same solution proposed in issue #79.
For the second bottleneck, I made an efficient
identity_matrix
class (derived from scipy'sspmatrix
). When such an identity matrixI
is multiplied withA
usingI.dot(A)
we getA
again.I've run a benchmark by making a script that estimates an
MNL
on the usual Swiss-Metro dataset. I ran theline-profiler
on some of the critical functions, namelycalc_gradient
andcalc_fisher_info_matrix
. In summary, this change reduced the computation time ofcalc_gradient
by 26% (from 0.080697 to 0.059372), and that ofcalc_fisher_info_matrix
by 99% (!) (from 0.906896s to 0.0062323s).Profiling results are attached.
profile_before.txt
profile_after.txt