-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applying COMBO to own dataset #4
Comments
Hi.
'pestcontrol' will be a good example to start with.
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/main.py#L28
In order to run COMBO, you need to run the function here with a properly
defined 'objective' argument.
In order to define your own objective,
you should define a class similar to
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/multiple_categorical.py#L123
An object(not a class), (e.g PestControl*()* ) is given as the 'objective'
argument.
As shown in
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/multiple_categorical.py#L123
the objective class should have members 1)n_vertices 2)adjacency_mat
3)fourier_freq 4)fourier_basis.
For example, if your problem has 5 categorical variables, C1, ..., C5 and
each Ci has Ki categories.
Then
- n_vertices <List[int]> : [K1, K2, K3, K4, K5]
- adjacency_mat <List[torch.Tensor]> : a list of adjacency matrices,
adjacency_mat[i].size() = torch.Size(Ki, Ki)
- fourier_freq <List[torch.Tensor]> : a list of eigenvalues of the
corresponding graph Laplacian, fourier_freq[i].size() = torch.Size(Ki)
- fourier_basis <List[torch.Tensor]> : a list of eigenvectors of the
corresponding graph Laplacian, fourier_freq[i].size() = torch.Size(Ki, Ki)
Initial points with which COMBO begin can be chosen using a prior knowledge
if it is available, otherwise, you can set it randomly.
And also, if you have another prior knowledge about the relation between
values (which are encoded in adjacency matrices).
You can assume weighted undirected graphs even though all experiments in
the paper assume UNWEIGHTED complete graphs for nominal variables and
UNWEIGHTED path graphs for ordinal variables.
…On Tue, Mar 10, 2020 at 5:10 PM nataliebarcickikas ***@***.***> wrote:
I would like to try your method on my own dataset and own objective
function. There isn't too much with respect to documentation about the
format and structure to do this in the repo. Could you please provide some
more details on how to set this up?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ABJKCMGRPBDYUUCDK57EL2TRGZRA7A5CNFSM4LFCSGN2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IT6GNYA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKCMH23ASAYZZDLA5BCIDRGZRA7ANCNFSM4LFCSGNQ>
.
|
Where are the number of categorical variables accounted for? In the case of the pest control problem, would that correspond to PESTCONTROL_N_STAGES = 5 in the following line? It is a little unclear because there is both a notion of stations and stages in your paper for this application. self.n_vertices = np.array([PESTCONTROL_N_CHOICE] *PESTCONTROL_N_STAGES) |
In pestcontrol,
There are 25(PESTCONTROL_N_STAGES) categorical variables each of which has
5(PESTCONTROL_N_CHOICE) categories.
Thus self.n_vertices = np.array(5, 5, ..., 5) where self.n_vertices.shape =
(25,)
In this example, the numbers of categories for each categorical variable
are the same, but different numbers of categories are possible.
As another example, if you have 3 categorical variables,
- C1(Adam, RMPProp, AdaDelta),
- C2(BatchNorm, LayerNorm, WeightNorm, SpectralNorm)
- C3(Using Dropout, Not using Dropout)
then self.n_vertices = np.array([3, 4, 2])
I used 'stage' for contamination control and 'station' for pestcontrol.
Both are basically the same.
…On Wed, Mar 11, 2020 at 11:30 AM nataliebarcickikas < ***@***.***> wrote:
Where are the number of categorical variables accounted for? In the case
of the pest control problem, would that correspond to PESTCONTROL_N_STAGES
= 5 in the following line? It is a little unclear because there is both a
notion of stations and stages in your paper for this application.
self.n_vertices = np.array([PESTCONTROL_N_CHOICE] *PESTCONTROL_N_STAGES)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=ABJKCMGRGVLKLUPTK2MKCILRG5R25A5CNFSM4LFCSGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOO7WJQ#issuecomment-597556006>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKCMG2GIYZKETA57ASKFTRG5R25ANCNFSM4LFCSGNQ>
.
|
I am still working with your code and would like to modify the initialization of the experiment. Specifically, I would like to initialize everything to "on" i.e. vector of all 1s, and then explore the input space from there, rather than randomly switching some nodes on and some off at the beginning. I assume I control this in "experiment_configuration.py". I wrote a new function below, but this constantly generates all "on" vectors during the optimization, rather than solely the first input as all "on". Can you point me as to how to modify the starting point? I think my confusion is stemming from my lack of understanding of the argument "n_points" and the corresponding for-loop.
This function then gets called in my class definition as follows:
But I don't understand the code entirely. |
Hi, Natalie.
If you want to set your first categorical variable to '1' in all your
initial points and others random.
for example in Ising problem.
then you can simply call
insert this line
self.suggested_init[:, 0] = 1 (making all first categorical variable '1')
after
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/binary_categorical.py#L100
where you have random initial points
Hope this helps. Please let me know you have further questions.
Changyong Oh
…On Mon, May 11, 2020 at 5:39 PM Natalie Barcicki ***@***.***> wrote:
I am still working with your code and would like to modify the
initialization of the experiment. Specifically, I would like to initialize
everything to "on" i.e. vector of all 1s, and then explore the input space
from there, rather than randomly switching some nodes on and some off at
the beginning. I assume I control this in "experiment_configuration.py". I
wrote a new function below, but this constantly generates all "on" vectors
during the optimization, rather than solely the first input as all "on".
Can you point me as to how to modify the starting point? I think my
confusion is stemming from my lack of understanding of the argument
"n_points" and the corresponding for-loop.
def allon_init_points(n_vertices, n_points, random_seed=None):
"""
:param n_vertices: 1D array
:param n_points:
:param random_seed:
:return:
"""
if random_seed is not None:
rng_state = torch.get_rng_state()
torch.manual_seed(random_seed)
init_points = torch.empty(0).long()
for _ in range(n_points):
init_points = torch.cat([init_points, torch.cat([torch.randint(1, 2, (1, 1)) for elm in n_vertices], dim=1)], dim=0) % silly way to make it a vecotr of all 1s
if random_seed is not None:
torch.set_rng_state(rng_state)
return init_points
This function then gets called in my class definition as follows:
self.suggested_init = torch.empty(0).long()
self.suggested_init = torch.cat([self.suggested_init, allon_init_points(self.n_vertices, 20-self.suggested_init.size(0), random_seed).long()], dim=0)
But I don't understand the code entirely.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKCMFKPK77V2QG6JEPEH3RRALZXANCNFSM4LFCSGNQ>
.
|
Thanks for your quick response. I think you misunderstood. Let's say I have 10 categorical variables. I want them all to take value '1' initially, and then for the Bayesian optimization to explore the space from there. So it would be: Would it be simply adding the line:
I guess part of my confusion stems from the fact that I don't understand why we specify all the initializations anyway - shouldn't the search be guided by the acquisition function? I want the algorithm to "learn" from the initial starting point of all 1s and be more inclined to staying at all 1s, since I know that the global optimum lies close to all 1s in my case. Shouldn't the acquisition function guide the next set of points to explore instead of the user? |
In order to set all to '1',
then you can just call
self.suggested_init[:, 0] = 1 [making the first row have all '1' values]
For example, here
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/binary_categorical.py#L100
you can see 20 - self.suggested_init.size(0) which make 20 initial points.
Here in the above line (99), self.suggested_init is just an empty set, so I
didn't specify any preferred initial points.
As you said, you can provide your preferred initial points in
self.suggested_init. (e.g. just one initial point [1, ..., 1])
then I guess the first several points suggested by BO will behave quite
randomly
because it needs some information (bad or good) to construct a model for
the function to be optimized.
So it is quite common to use a certain number of random points and (if
there are) some preferred points as initial points.
However, as long as you provide one initial point, it is OK.
In COMBO, it does not provide a way to encourage a search in a certain
region.
Conceptuallly, acq func will suggest a point with the potential to be
optimal no matter how close it is to [1, ..., 1] or not.
This can be indirectly encouraged by making acquisition function start its
optimization near a certain point
for example at
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/acquisition/acquisition_optimizers/starting_points.py#L12
I use the spray points which are points near the currently available
optimum to make BO search around the currently available optimum.
You may tweak this part to make optim_init (not BO's initial point, but
initial points where acq optim starts) pick points in the region of your
interest.
Thanks.
Changyong Oh
…On Mon, May 11, 2020 at 6:10 PM Natalie Barcicki ***@***.***> wrote:
Also, I don't want it to be random - I want it to "learn" from the initial
starting point of all 1s and be more inclined to staying at all 1s, but I
assume the acquisition function would guide it that way...? (I assume that
the global optimum lies close to all 1s in my case).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKCMDLYOTWCDZJOJ4YEJLRRAPQPANCNFSM4LFCSGNQ>
.
|
Hi Changyong, One bottleneck in my simulations seems to be related to posterior sampling, specifically done in this line: Line 100 in e16358d
(11:11:52) Sampling ��������������������������������������������������������������������������������(11:16:58) 10% (1 of 10) |#####---------------------------------------------|��������������������������������������������������������������������������������(11:22:16) 20% (2 of 10) |##########----------------------------------------|��������������������������������������������������������������������������������(11:27:31) 30% (3 of 10) |###############-----------------------------------|��������������������������������������������������������������������������������(11:32:46) 40% (4 of 10) |####################------------------------------|��������������������������������������������������������������������������������(11:38:01) 50% (5 of 10) |#########################-------------------------|��������������������������������������������������������������������������������(11:43:22) 60% (6 of 10) |##############################--------------------|��������������������������������������������������������������������������������(11:48:32) 70% (7 of 10) |###################################---------------|��������������������������������������������������������������������������������(11:53:45) 80% (8 of 10) |########################################----------|��������������������������������������������������������������������������������(11:58:56) 90% (9 of 10) |#############################################-----|���������������������������������������������������������������������������������(12:04:08) 100% (10 of 10) |##################################################| Can you explain what is happening in this series of steps and how one might be able to speed it up? When I open the corresponding function "posterior_sampling", I see there is a comment about what is good practice or not, but I am having troubles deciphering it. Thanks a lot! Best, Natalie |
Hi, Natalie
You can ignore the comment which is about learning underlying graph
structure, which is not supported.
In your experiment, collecting 10 samples takes almost 40 minutes?
In COMBO, it uses elementwise slice sampling, elementwise means that each
beta is updated while others are fixed (like Gibbs sampling)
therefore, if you have K categorical variables, then its complexity is
O(K^2), this needs to be checked but I think its more and less true.
If your problem has a large number of categories like several hundred it
may take that long (This is a limitation of COMBO)
Or if you python is set to not using MKL, it can be further sped up using
MKL.
The largest number of categorical variables tested in is 60 dim.
If you want to speed up the sampling process itself, this is another work
and requires some significant change in the current COMBO implementation.
A possible option would be marginal likelihood optimization instead of
sampling, but sampling is known to perform better in terms of BO
performance.
You can consider HMC or its variant or nonelementwise slice sampling.
Thanks
…On Fri, Jul 24, 2020 at 10:11 AM nataliebarcickikas < ***@***.***> wrote:
Hi Changyong,
One bottleneck in my simulations seems to be related to posterior
sampling, specifically done in this line:
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/main.py#L100
The sampling takes a long time as shown in the timestamps below:
(11:11:52) Sampling
��������������������������������������������������������������������������������(11:16:58)
10% (1 of 10)
|#####---------------------------------------------|��������������������������������������������������������������������������������(11:22:16)
20% (2 of 10)
|##########----------------------------------------|��������������������������������������������������������������������������������(11:27:31)
30% (3 of 10)
|###############-----------------------------------|��������������������������������������������������������������������������������(11:32:46)
40% (4 of 10)
|####################------------------------------|��������������������������������������������������������������������������������(11:38:01)
50% (5 of 10)
|#########################-------------------------|��������������������������������������������������������������������������������(11:43:22)
60% (6 of 10)
|##############################--------------------|��������������������������������������������������������������������������������(11:48:32)
70% (7 of 10)
|###################################---------------|��������������������������������������������������������������������������������(11:53:45)
80% (8 of 10)
|########################################----------|��������������������������������������������������������������������������������(11:58:56)
90% (9 of 10)
|#############################################-----|���������������������������������������������������������������������������������(12:04:08)
100% (10 of 10) |##################################################|
Can you explain what is happening in this series of steps and how one
might be able to speed it up? When I open the corresponding function
"posterior_sampling", I see there is a comment about what is good practice
or not, but I am having troubles deciphering it.
Thanks a lot!
Best,
Natalie
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKCMERL7KSJ4GX7YKDDJTR5E63NANCNFSM4LFCSGNQ>
.
|
Hi ChangYong, Thanks a lot for the clarification. Unfortunately, I am indeed using a couple of hundred categorical variables so this step is a real bottleneck...Can the elementwise slice sampling loop be parallelized using multiprocessing? I see that in each sampler, model.kernel fourier_freq_list and fourier_basis_list are updated, so I assume not. Best, |
Just following up on the above - not sure if parallelizing the elementwise slice sampling loop is mathematically sound in the algorithm? Best, |
The elementwise slice sampling is nothing but a Gibbs sampling using a
sequence of one dimensional slice samplers.
In order for sampling to be the Markov chain, one dimensional slice
samplers can not be parallelized because each one-dim sampler uses the
updated value of previous ones.
One possible option is not using elementwise version, which may requires
some modification in the code, though.
For example, when you have b1, ..., bK parameters.
in elementwise slice sampler, assuming that parameters are updated in the
given order
(b1, b2, ...., bK) -> (b'1, b2, ...., bK) -> (b'1, b'2, ...., bK) -> (b'1,
b'2, ...., b'K)
another option is first to choose the random direction, d, in R^K
then you can perform just one dimensional slice sampling to have update
from (b1, b2, ...., bK) -> (b1, b2, ...., bK) + \theta d
where \theta is chosen by one dimensional slice sampling.
This will reduce the sampling cost from O(K^2) to O(K), but I am not sure
how good the collected sampling with this method will be compared to the
elementwise slice sampling.
https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/graphGP/sampler/sample_posterior.py#L54
This is where the elementwise slice sampling is used, 'shuffled_beta_ind'
is to shuffle the order with which the one-dim slice sampling is applied.
You can modify this part to use so-called random-direction slice sampling
Thanks
Changyong Oh
…On Wed, Jul 29, 2020 at 2:16 PM nataliebarcickikas ***@***.***> wrote:
Just following up on the above - not sure if parallelizing the elementwise
slice sampling loop is mathematically sound in the algorithm?
Best,
Natalie
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKCMEZPQBLCULPCMQBIILR6AHKPANCNFSM4LFCSGNQ>
.
|
I would like to try your method on my own dataset and own objective function. There isn't too much with respect to documentation about the format and structure to do this in the repo. Could you please provide some more details on how to set this up?
The text was updated successfully, but these errors were encountered: