Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Parallel Processing Interface #47

Open
mjallday opened this issue Feb 14, 2021 · 2 comments
Open

Proposed Parallel Processing Interface #47

mjallday opened this issue Feb 14, 2021 · 2 comments
Labels
api Change or addition to API triaged

Comments

@mjallday
Copy link
Contributor

this issue is discussing how to expose an interface in larky for parallel processing of data. currently larky is single threaded but many files for batch processing lend themselves to parallelism.

interface for multiprocessing.map would be something like multiprocessing.map(iterator, transformer) where transformer would be a lambda that takes each element along with the ctx and return the output of the transform.

operations:
- Script:
  lang: starlarky
  script: |
    load(@vgs/multiprocessing, 'multiprocessing')
    def process(input, ctx):
      result = '\n'.join(multiprocessing.map(input.split('\n'), lambda x, ctx: vault.put(x[1]))
      return result, ctx

assume input is a stream like object for sftp files or http object for http requests.

multiprocessing.map would be some interface to some execution framework such as spark which would execute the lambda and use the number of processes that customer has provisioned.

@mjallday mjallday added the api Change or addition to API label Feb 14, 2021
@mjallday
Copy link
Contributor Author

AB33AA52-2707-42B3-935D-C20C80865E2C

here’s an example of a similar interface using Apache beam.

@mjallday
Copy link
Contributor Author

https://docs.dask.org/en/latest/futures.html Here’s an example from Dask
29E0E9D8-C127-4561-B36A-F97204C74C05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Change or addition to API triaged
Projects
None yet
Development

No branches or pull requests

2 participants