Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesseract.js doesn't work in CloudFlare Workers #950

Open
penkzhou opened this issue Sep 3, 2024 · 2 comments
Open

Tesseract.js doesn't work in CloudFlare Workers #950

penkzhou opened this issue Sep 3, 2024 · 2 comments

Comments

@penkzhou
Copy link

penkzhou commented Sep 3, 2024

Tesseract.js version 5.1.1

Describe the bug
Tesseract.js doesn't work on Cloudflare because it make it as browser env.

To Reproduce
Steps to reproduce the behavior:

  1. install Tesseract on cloudflare sample project and just run.
  2. the log is:
✘ [ERROR] ReferenceError: Worker is not defined

      at module.exports
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/tesseract.js@5.1.1/node_modules/tesseract.js/src/worker/browser/spawnWorker.js:14:5)
      at module.exports [as createWorker]
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/tesseract.js@5.1.1/node_modules/tesseract.js/src/createWorker.js:46:16)
      at workerGen
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/src/index.ts:33:34)
      at Array.<anonymous>
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/src/index.ts:16:15)
      at Hono2.dispatch
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/hono@4.5.10/node_modules/hono/dist/hono-base.js:187:37)
      at Hono2.fetch
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/hono@4.5.10/node_modules/hono/dist/hono-base.js:213:17)
      at fetchDispatcher
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/.wrangler/tmp/bundle-2Cuv2V/middleware-loader.entry.ts:54:17)
      at __facade_invokeChain__
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/wrangler@3.73.0_@cloudflare+workers-types@4.20240821.1/node_modules/wrangler/templates/middleware/common.ts:53:9)
      at Object.next
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/wrangler@3.73.0_@cloudflare+workers-types@4.20240821.1/node_modules/wrangler/templates/middleware/common.ts:50:11)
      at jsonError
  (file:///Users/xxx/Workspace/WebstormProjects/podcast-backend/node_modules/.pnpm/wrangler@3.73.0_@cloudflare+workers-types@4.20240821.1/node_modules/wrangler/templates/middleware/middleware-miniflare3-json-error.ts:22:30)

Please attach any input image required to replicate this behavior.

Expected behavior
It should be treat as node env but now is browser.

Device Version:

  • OS + Version: macOS 14.6.1 (23G93)

Additional context
Add any other context about the problem here.

@Balearica
Copy link
Member

I assume you are talking about CloudFlare Workers. CloudFlare Workers do not use Node.js--they use a custom runtime that uses V8, and implements a handful of Node.js APIs. Therefore, the fact that the CloudFlare worker is deciding to use the browser path is probably an intentional design decision on CloudFlare's part.

The Workers runtime also implements a subset of Node.js APIs as an optional, opt-in compatibility layer.
https://developers.cloudflare.com/workers/runtime-apis/

Regardless of the browser vs. Node.js issue, the code in question is attempting to spawn a worker, and I do not believe it is possible to spawn a worker within a CloudFlare worker. For context, the Tesseract.js createWorker function spawns a worker that executes Tesseract jobs on a separate thread. This uses web workers in browser, and the worker_threads module in Node.js. My understanding of CloudFlare workers are that they are inherently single-threaded, so do not support spawning more threads.

To use Tesseract.js within a single thread, the worker code would need to be executed directly. Making the worker scripts run standalone would likely not be that difficult--this would presumably just require using src/worker-script/index.js directly after deleting the exports.dispatchHandlers stuff that communicates with the main thread.

@penkzhou
Copy link
Author

penkzhou commented Sep 6, 2024

Thanks for your explanation. I will try your suggestion.

@Balearica Balearica changed the title Tesseract.js doesn't work on Cloudflare because it make it as browser env. Tesseract.js doesn't work in CloudFlare Workers Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@penkzhou @Balearica and others