Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fragmenting only once for a given window size #5

Open
KyleStiers opened this issue Aug 3, 2017 · 0 comments
Open

Fragmenting only once for a given window size #5

KyleStiers opened this issue Aug 3, 2017 · 0 comments

Comments

@KyleStiers
Copy link
Collaborator

KyleStiers commented Aug 3, 2017

I haven't thought through all the details of how this would be implemented, but I feel like re-fragmenting the entire PDB every time is a waste of power.

It seems to me outputting a text file with parameters such as what the window size is, when the fragments were made (i.e. less than a week ago, otherwise there are new PDBs deposited), etc and then writing all the hadoop sequence files of the fragments to a new directory would be ideal. The pseudo-code would potentially look something like this:

If param file exists
    if param file is same window size & less than a week old
          pass path to stored files to remainder of fragment-search (i.e. calculations)
    elseif less than a week old
          just update the last week's pdbs and pass them in + the old fragments (if possible?)
    else
         start fresh

Some of it may be more complicated than it's worth. But minimally checking to see if the files already exist and are of same window size should be do-able I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant