Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store each uploaded My Clippings.txt file #16

Closed
mammuth opened this issue Sep 22, 2019 · 5 comments · Fixed by #38
Closed

Store each uploaded My Clippings.txt file #16

mammuth opened this issue Sep 22, 2019 · 5 comments · Fixed by #38

Comments

@mammuth
Copy link
Owner

mammuth commented Sep 22, 2019

We should do this in order to be able to migrate features like #5 to already uploaded clippings.

At best, we create a model which keeps track of every upload with a date and the txt file (associated to a user).

@mammuth
Copy link
Owner Author

mammuth commented Sep 22, 2019

It will also help debugging issues due to different formats (like date formats on different Kindle devices etc.)

@mammuth mammuth added the good first issue Good for newcomers label May 6, 2021
@mammuth mammuth mentioned this issue May 13, 2021
2 tasks
@mammuth
Copy link
Owner Author

mammuth commented May 15, 2021

If we do this we could also refactor the import process into more decoupled "jobs" where the upload only stores the file and following distinct jobs parse and store it in the DB, get a book cover via an external API, etc. ...

Obviously not needed, but might be a nice opportunity for cleaning up the code a bit 😊

I think on the current hosting plan we even have Celery, so if we wanted we could actually run the jobs asynchronously (if not, there might still be value in separating the process)

Edit: Celery workers can only be added with the business plans and they're way too expensive for this hobby project, so we won't get Celery unless we migrate to a different hosting solution. We could also ask them to give us Celery workers, I'd expect this to work, but it will still be additional costs, so not really worth it I guess.
OnPaste 20210515-103110

@JSerwatka
Copy link
Contributor

The idea with an async queue is great, but for this price it of course doesn't make any sense.

I believe that book cover fetching will be the most time costly element here, because to work properly it will probably need to use 2 requests (ISBN and book cover).

@JSerwatka
Copy link
Contributor

I've created a mind map to try to gather all possible solutions to our problem with new features and storing the latest data. You can find it here. My favorite solution is marked in green.

To compare clippings we have to use their hash, there is no other way here.

@mammuth, I'd love to hear your thoughts on that.

@mammuth
Copy link
Owner Author

mammuth commented Jul 30, 2021

've created a mind map to try to gather all possible solutions to our problem with new features and storing the latest data. You can find it here. My favorite solution is marked in green.

Sorry, I missed your miro board somehow, but I just reviewed it. To be honest, I'm not 100% sure what problem we're trying to solve with your green path.
Is it about the use case that the content of a clipping changes? 🤔
Is this really a thing? For my reading workflow, it's not. I read books, make highlights, upload the highlights, and won't change the highlights afterward. I actually don't even know how this tool behaves, when adding comments, I think I never tried that 🙈

To compare clippings we have to use their hash, there is no other way here.

Random note: We already store the hash in the DB and use it as a DB unique constraint to ensure that we're not importing a clipping twice:

unique_together = ('user', 'book', 'content_hash',)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants