-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complex Backprop and Learning speed #2
Comments
Differentiability in the complex domain requires that a C to C function satisfy Cauchy-Riemann conditions, which imply that a holomorphic C to R function is constant. Therefore there cannot be an end-to-end holomorphic learning problem, since we ultimately use some sort of non-constant real-valued fitness function To be able to learn complex-valued networks we have to forgo the C-differentiability requirement. This can be done by using C-R (Wirtinger) calculus, which extends holomorphic calculus to non-holomorphic functions by considering C-domain functions as defined on R^2 with independent Other references include Trabelsi et al. (2018), Zhang et al. (2014), Adali et al. (2011), Benvenuto and Piazza (1992), appendix D of Nazarov and Burnaev (2020) or a discussion here. |
@ivannz Thank you for the detailed answer. |
Might need to brush up on some complex analysis |
@ivannz I thought i might carry on asking questions here rather than opening seperate issues. Hence why i've edited the title. I've ported a mobilenetv2 network from torchvision to use modules in this repo, namely: CplxConv1d, CplxBatchNorm1d and CplxAdaptiveModReLU. I'm training this network to classify some complex valued 1D data. But it's very very slow to learn. Have you noticed the same thing when applying these modules to your datasets? I've only tried this in one domain, so hard to tell if its the nature of the modules or if it's the problem domain. |
I have used this module for both classification and regression tasks:
If by learning speed you mean its overall arithmetic complexity, then yes -- a complex network uses 4 times as many multiplications as a real-valued network with the same number of intermediate features, i.e. the number of linear outputs or convolutional channels and not overall parameters. Even if one compares a complex network to a real one with the same number of floating-point parameters, then a complex network is still slower, but not dramatically. Please, also bear in mind the discussion in issue #1 -- If by learning speed you mean the convergence rate of gradient descent or the rate of train loss decrease, I haven't measured it per se, but at the same time haven't noticed anything suggesting, that complex nets are slower to learn. The test performance depends on the dataset, but in my experience complex-valued networks seldom outperformed real-valued networks, mostly on par. See Nazarov and Burnaev (2020) and references therein. PS: I'd rather you created another issue to keep unrelated discussions separate. |
@ivannz Again thank you for your response and I will make sure to open new issues for separate discussions. |
I used a private dataset for a digital predistortion task, i.e. approximating a perturbation of an input signal, so that a power amplifier would operate in linear gain regime. |
Question: In the real domain, you only require differentiability for back-propagation to work. In the complex domain, you need holomorphism. Now pytorch doesn't check this because it doesn't natively support complex numbers. Do you think there could be a learning/training problem with back-propagation if some of the functions don't support Cauchy-Riemann equations?
Question: In complex analysis, a function f(z) has two derivatives: df/dz and df/dz*. If the forward passes are implemented correctly, as you have done, is back-propagation well defined? Specifically, do you get both derivatives being back-propagated?
The text was updated successfully, but these errors were encountered: