Applying COMBO to own dataset #4

nataliebarcickikas · 2020-03-10T16:10:39Z

I would like to try your method on my own dataset and own objective function. There isn't too much with respect to documentation about the format and structure to do this in the repo. Could you please provide some more details on how to set this up?

ChangYong-Oh · 2020-03-10T16:37:35Z

Hi. 'pestcontrol' will be a good example to start with. https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/main.py#L28 In order to run COMBO, you need to run the function here with a properly defined 'objective' argument. In order to define your own objective, you should define a class similar to https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/multiple_categorical.py#L123 An object(not a class), (e.g PestControl*()* ) is given as the 'objective' argument. As shown in https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/multiple_categorical.py#L123 the objective class should have members 1)n_vertices 2)adjacency_mat 3)fourier_freq 4)fourier_basis. For example, if your problem has 5 categorical variables, C1, ..., C5 and each Ci has Ki categories. Then - n_vertices <List[int]> : [K1, K2, K3, K4, K5] - adjacency_mat <List[torch.Tensor]> : a list of adjacency matrices, adjacency_mat[i].size() = torch.Size(Ki, Ki) - fourier_freq <List[torch.Tensor]> : a list of eigenvalues of the corresponding graph Laplacian, fourier_freq[i].size() = torch.Size(Ki) - fourier_basis <List[torch.Tensor]> : a list of eigenvectors of the corresponding graph Laplacian, fourier_freq[i].size() = torch.Size(Ki, Ki) Initial points with which COMBO begin can be chosen using a prior knowledge if it is available, otherwise, you can set it randomly. And also, if you have another prior knowledge about the relation between values (which are encoded in adjacency matrices). You can assume weighted undirected graphs even though all experiments in the paper assume UNWEIGHTED complete graphs for nominal variables and UNWEIGHTED path graphs for ordinal variables.

…

On Tue, Mar 10, 2020 at 5:10 PM nataliebarcickikas ***@***.***> wrote: I would like to try your method on my own dataset and own objective function. There isn't too much with respect to documentation about the format and structure to do this in the repo. Could you please provide some more details on how to set this up? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=ABJKCMGRPBDYUUCDK57EL2TRGZRA7A5CNFSM4LFCSGN2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IT6GNYA>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKCMH23ASAYZZDLA5BCIDRGZRA7ANCNFSM4LFCSGNQ> .

nataliebarcickikas · 2020-03-11T10:29:51Z

Where are the number of categorical variables accounted for? In the case of the pest control problem, would that correspond to PESTCONTROL_N_STAGES = 5 in the following line? It is a little unclear because there is both a notion of stations and stages in your paper for this application.

self.n_vertices = np.array([PESTCONTROL_N_CHOICE] *PESTCONTROL_N_STAGES)

ChangYong-Oh · 2020-03-11T10:55:42Z

In pestcontrol, There are 25(PESTCONTROL_N_STAGES) categorical variables each of which has 5(PESTCONTROL_N_CHOICE) categories. Thus self.n_vertices = np.array(5, 5, ..., 5) where self.n_vertices.shape = (25,) In this example, the numbers of categories for each categorical variable are the same, but different numbers of categories are possible. As another example, if you have 3 categorical variables, - C1(Adam, RMPProp, AdaDelta), - C2(BatchNorm, LayerNorm, WeightNorm, SpectralNorm) - C3(Using Dropout, Not using Dropout) then self.n_vertices = np.array([3, 4, 2]) I used 'stage' for contamination control and 'station' for pestcontrol. Both are basically the same.

…

On Wed, Mar 11, 2020 at 11:30 AM nataliebarcickikas < ***@***.***> wrote: Where are the number of categorical variables accounted for? In the case of the pest control problem, would that correspond to PESTCONTROL_N_STAGES = 5 in the following line? It is a little unclear because there is both a notion of stations and stages in your paper for this application. self.n_vertices = np.array([PESTCONTROL_N_CHOICE] *PESTCONTROL_N_STAGES) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=ABJKCMGRGVLKLUPTK2MKCILRG5R25A5CNFSM4LFCSGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOO7WJQ#issuecomment-597556006>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKCMG2GIYZKETA57ASKFTRG5R25ANCNFSM4LFCSGNQ> .

nataliebarcicki · 2020-05-11T15:38:52Z

I am still working with your code and would like to modify the initialization of the experiment. Specifically, I would like to initialize everything to "on" i.e. vector of all 1s, and then explore the input space from there, rather than randomly switching some nodes on and some off at the beginning. I assume I control this in "experiment_configuration.py". I wrote a new function below, but this constantly generates all "on" vectors during the optimization, rather than solely the first input as all "on". Can you point me as to how to modify the starting point? I think my confusion is stemming from my lack of understanding of the argument "n_points" and the corresponding for-loop.

def allon_init_points(n_vertices, n_points, random_seed=None):
    """

    :param n_vertices: 1D array
    :param n_points:
    :param random_seed:
    :return:
    """
    if random_seed is not None:
        rng_state = torch.get_rng_state()
        torch.manual_seed(random_seed)
    init_points = torch.empty(0).long()
    for _ in range(n_points):
        init_points = torch.cat([init_points, torch.cat([torch.randint(1, 2, (1, 1)) for elm in n_vertices], dim=1)], dim=0) % silly way to make it a vecotr of all 1s
    if random_seed is not None:
        torch.set_rng_state(rng_state)
    return init_points

This function then gets called in my class definition as follows:

self.suggested_init = torch.empty(0).long()
 self.suggested_init = torch.cat([self.suggested_init, allon_init_points(self.n_vertices, 20-self.suggested_init.size(0), random_seed).long()], dim=0)

But I don't understand the code entirely.

ChangYong-Oh · 2020-05-11T15:50:26Z

Hi, Natalie. If you want to set your first categorical variable to '1' in all your initial points and others random. for example in Ising problem. then you can simply call insert this line self.suggested_init[:, 0] = 1 (making all first categorical variable '1') after https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/binary_categorical.py#L100 where you have random initial points Hope this helps. Please let me know you have further questions. Changyong Oh

…

On Mon, May 11, 2020 at 5:39 PM Natalie Barcicki ***@***.***> wrote: I am still working with your code and would like to modify the initialization of the experiment. Specifically, I would like to initialize everything to "on" i.e. vector of all 1s, and then explore the input space from there, rather than randomly switching some nodes on and some off at the beginning. I assume I control this in "experiment_configuration.py". I wrote a new function below, but this constantly generates all "on" vectors during the optimization, rather than solely the first input as all "on". Can you point me as to how to modify the starting point? I think my confusion is stemming from my lack of understanding of the argument "n_points" and the corresponding for-loop. def allon_init_points(n_vertices, n_points, random_seed=None): """ :param n_vertices: 1D array :param n_points: :param random_seed: :return: """ if random_seed is not None: rng_state = torch.get_rng_state() torch.manual_seed(random_seed) init_points = torch.empty(0).long() for _ in range(n_points): init_points = torch.cat([init_points, torch.cat([torch.randint(1, 2, (1, 1)) for elm in n_vertices], dim=1)], dim=0) % silly way to make it a vecotr of all 1s if random_seed is not None: torch.set_rng_state(rng_state) return init_points This function then gets called in my class definition as follows: self.suggested_init = torch.empty(0).long() self.suggested_init = torch.cat([self.suggested_init, allon_init_points(self.n_vertices, 20-self.suggested_init.size(0), random_seed).long()], dim=0) But I don't understand the code entirely. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKCMFKPK77V2QG6JEPEH3RRALZXANCNFSM4LFCSGNQ> .

nataliebarcicki · 2020-05-11T16:06:50Z

Thanks for your quick response. I think you misunderstood.

Let's say I have 10 categorical variables. I want them all to take value '1' initially, and then for the Bayesian optimization to explore the space from there. So it would be:
Step 0: [1 1 1 1 1 1 1 1 1 1]
Step 1: [1 0 1 1 1 0 1 1 1 0] (For example)
Step 2: [1 1 0 1 1 1 1 0 0 1]
and so on...

Would it be simply adding the line:
self.suggested_init[0, 1] = 1 (making ALL categorical variable '1') after

COMBO/COMBO/experiments/test_functions/binary_categorical.py

Line 100 in e16358d

    
           self.suggested_init = torch.cat([self.suggested_init, sample_init_points(self.n_vertices, 20 - self.suggested_init.size(0), random_seed_pair[1]).long()], dim=0)

I guess part of my confusion stems from the fact that I don't understand why we specify all the initializations anyway - shouldn't the search be guided by the acquisition function? I want the algorithm to "learn" from the initial starting point of all 1s and be more inclined to staying at all 1s, since I know that the global optimum lies close to all 1s in my case. Shouldn't the acquisition function guide the next set of points to explore instead of the user?

ChangYong-Oh · 2020-05-11T17:13:50Z

In order to set all to '1', then you can just call self.suggested_init[:, 0] = 1 [making the first row have all '1' values] For example, here https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/experiments/test_functions/binary_categorical.py#L100 you can see 20 - self.suggested_init.size(0) which make 20 initial points. Here in the above line (99), self.suggested_init is just an empty set, so I didn't specify any preferred initial points. As you said, you can provide your preferred initial points in self.suggested_init. (e.g. just one initial point [1, ..., 1]) then I guess the first several points suggested by BO will behave quite randomly because it needs some information (bad or good) to construct a model for the function to be optimized. So it is quite common to use a certain number of random points and (if there are) some preferred points as initial points. However, as long as you provide one initial point, it is OK. In COMBO, it does not provide a way to encourage a search in a certain region. Conceptuallly, acq func will suggest a point with the potential to be optimal no matter how close it is to [1, ..., 1] or not. This can be indirectly encouraged by making acquisition function start its optimization near a certain point for example at https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/acquisition/acquisition_optimizers/starting_points.py#L12 I use the spray points which are points near the currently available optimum to make BO search around the currently available optimum. You may tweak this part to make optim_init (not BO's initial point, but initial points where acq optim starts) pick points in the region of your interest. Thanks. Changyong Oh

…

On Mon, May 11, 2020 at 6:10 PM Natalie Barcicki ***@***.***> wrote: Also, I don't want it to be random - I want it to "learn" from the initial starting point of all 1s and be more inclined to staying at all 1s, but I assume the acquisition function would guide it that way...? (I assume that the global optimum lies close to all 1s in my case). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKCMDLYOTWCDZJOJ4YEJLRRAPQPANCNFSM4LFCSGNQ> .

nataliebarcickikas · 2020-07-24T08:11:18Z

Hi Changyong,

One bottleneck in my simulations seems to be related to posterior sampling, specifically done in this line:

COMBO/COMBO/main.py

Line 100 in e16358d

    
           sample_posterior = posterior_sampling(surrogate_model, eval_inputs, eval_outputs, n_vertices, adj_mat_list,

The sampling takes a long time as shown in the timestamps below:
(11:11:52) Sampling
��(11:16:58) 10% (1 of 10) |#####---------------------------------------------|��(11:22:16) 20% (2 of 10) |##########----------------------------------------|��(11:27:31) 30% (3 of 10) |###############-----------------------------------|��(11:32:46) 40% (4 of 10) |####################------------------------------|��(11:38:01) 50% (5 of 10) |#########################-------------------------|��(11:43:22) 60% (6 of 10) |##############################--------------------|��(11:48:32) 70% (7 of 10) |###################################---------------|��(11:53:45) 80% (8 of 10) |########################################----------|��(11:58:56) 90% (9 of 10) |#############################################-----|��(12:04:08) 100% (10 of 10) |##################################################|
Can you explain what is happening in this series of steps and how one might be able to speed it up? When I open the corresponding function "posterior_sampling", I see there is a comment about what is good practice or not, but I am having troubles deciphering it.
Thanks a lot!
Best,
Natalie

ChangYong-Oh · 2020-07-25T11:00:33Z

Hi, Natalie You can ignore the comment which is about learning underlying graph structure, which is not supported. In your experiment, collecting 10 samples takes almost 40 minutes? In COMBO, it uses elementwise slice sampling, elementwise means that each beta is updated while others are fixed (like Gibbs sampling) therefore, if you have K categorical variables, then its complexity is O(K^2), this needs to be checked but I think its more and less true. If your problem has a large number of categories like several hundred it may take that long (This is a limitation of COMBO) Or if you python is set to not using MKL, it can be further sped up using MKL. The largest number of categorical variables tested in is 60 dim. If you want to speed up the sampling process itself, this is another work and requires some significant change in the current COMBO implementation. A possible option would be marginal likelihood optimization instead of sampling, but sampling is known to perform better in terms of BO performance. You can consider HMC or its variant or nonelementwise slice sampling. Thanks

…

On Fri, Jul 24, 2020 at 10:11 AM nataliebarcickikas < ***@***.***> wrote: Hi Changyong, One bottleneck in my simulations seems to be related to posterior sampling, specifically done in this line: https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/main.py#L100 The sampling takes a long time as shown in the timestamps below: (11:11:52) Sampling ��(11:16:58) 10% (1 of 10) |#####---------------------------------------------|��(11:22:16) 20% (2 of 10) |##########----------------------------------------|��(11:27:31) 30% (3 of 10) |###############-----------------------------------|��(11:32:46) 40% (4 of 10) |####################------------------------------|��(11:38:01) 50% (5 of 10) |#########################-------------------------|��(11:43:22) 60% (6 of 10) |##############################--------------------|��(11:48:32) 70% (7 of 10) |###################################---------------|��(11:53:45) 80% (8 of 10) |########################################----------|��(11:58:56) 90% (9 of 10) |#############################################-----|��(12:04:08) 100% (10 of 10) |##################################################| Can you explain what is happening in this series of steps and how one might be able to speed it up? When I open the corresponding function "posterior_sampling", I see there is a comment about what is good practice or not, but I am having troubles deciphering it. Thanks a lot! Best, Natalie — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKCMERL7KSJ4GX7YKDDJTR5E63NANCNFSM4LFCSGNQ> .

nataliebarcickikas · 2020-07-27T10:36:06Z

Hi ChangYong,

Thanks a lot for the clarification. Unfortunately, I am indeed using a couple of hundred categorical variables so this step is a real bottleneck...Can the elementwise slice sampling loop be parallelized using multiprocessing? I see that in each sampler, model.kernel fourier_freq_list and fourier_basis_list are updated, so I assume not.

Best,
Natalie

nataliebarcickikas · 2020-07-29T12:16:23Z

Just following up on the above - not sure if parallelizing the elementwise slice sampling loop is mathematically sound in the algorithm?

Best,
Natalie

ChangYong-Oh · 2020-07-29T12:27:47Z

The elementwise slice sampling is nothing but a Gibbs sampling using a sequence of one dimensional slice samplers. In order for sampling to be the Markov chain, one dimensional slice samplers can not be parallelized because each one-dim sampler uses the updated value of previous ones. One possible option is not using elementwise version, which may requires some modification in the code, though. For example, when you have b1, ..., bK parameters. in elementwise slice sampler, assuming that parameters are updated in the given order (b1, b2, ...., bK) -> (b'1, b2, ...., bK) -> (b'1, b'2, ...., bK) -> (b'1, b'2, ...., b'K) another option is first to choose the random direction, d, in R^K then you can perform just one dimensional slice sampling to have update from (b1, b2, ...., bK) -> (b1, b2, ...., bK) + \theta d where \theta is chosen by one dimensional slice sampling. This will reduce the sampling cost from O(K^2) to O(K), but I am not sure how good the collected sampling with this method will be compared to the elementwise slice sampling. https://github.com/QUVA-Lab/COMBO/blob/e16358d706525874fb3b1a79c36b6200d4689ed9/COMBO/graphGP/sampler/sample_posterior.py#L54 This is where the elementwise slice sampling is used, 'shuffled_beta_ind' is to shuffle the order with which the one-dim slice sampling is applied. You can modify this part to use so-called random-direction slice sampling Thanks Changyong Oh

…

On Wed, Jul 29, 2020 at 2:16 PM nataliebarcickikas ***@***.***> wrote: Just following up on the above - not sure if parallelizing the elementwise slice sampling loop is mathematically sound in the algorithm? Best, Natalie — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKCMEZPQBLCULPCMQBIILR6AHKPANCNFSM4LFCSGNQ> .

jay-el124 mentioned this issue Jan 17, 2023

Accessing a singular vertex #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying COMBO to own dataset #4

Applying COMBO to own dataset #4

nataliebarcickikas commented Mar 10, 2020

ChangYong-Oh commented Mar 10, 2020 via email

nataliebarcickikas commented Mar 11, 2020

ChangYong-Oh commented Mar 11, 2020 via email

nataliebarcicki commented May 11, 2020

ChangYong-Oh commented May 11, 2020 via email

nataliebarcicki commented May 11, 2020 •

edited

Loading

ChangYong-Oh commented May 11, 2020 via email

nataliebarcickikas commented Jul 24, 2020

ChangYong-Oh commented Jul 25, 2020 via email

nataliebarcickikas commented Jul 27, 2020

nataliebarcickikas commented Jul 29, 2020

ChangYong-Oh commented Jul 29, 2020 via email

Applying COMBO to own dataset #4

Applying COMBO to own dataset #4

Comments

nataliebarcickikas commented Mar 10, 2020

ChangYong-Oh commented Mar 10, 2020 via email

nataliebarcickikas commented Mar 11, 2020

ChangYong-Oh commented Mar 11, 2020 via email

nataliebarcicki commented May 11, 2020

ChangYong-Oh commented May 11, 2020 via email

nataliebarcicki commented May 11, 2020 • edited Loading

ChangYong-Oh commented May 11, 2020 via email

nataliebarcickikas commented Jul 24, 2020

ChangYong-Oh commented Jul 25, 2020 via email

nataliebarcickikas commented Jul 27, 2020

nataliebarcickikas commented Jul 29, 2020

ChangYong-Oh commented Jul 29, 2020 via email

nataliebarcicki commented May 11, 2020 •

edited

Loading