Archiving papers using Zotero headless?

andrew0@lemmy.dbzer0.com · 5 days ago

Wero is being rolled out slowly in Western Europe. I believe it’s already a thing in Germany, France, Belgium, and followed soon by the Netherlands.

andrew0@lemmy.dbzer0.com · 3 months ago

Get a dog. I’m now forced to get up early to take it out, otherwise it will pee on my bed.

(Do not actually get a pet if you cannot take care of them.)

andrew0@lemmy.dbzer0.com · 6 months ago

Be wary that their docs are so and so. Nanonets OCR, Mistral OCR and MinerU will also extract formulas and images.

One other model I forgot to mention is Docling. This one is quite quick to set up in a docker container, and will have a web interface ready to go where you can upload documents. This sort of follows the PaddleOCR pipeline, but also allows you to use vLMs.

Good luck!

andrew0@lemmy.dbzer0.com · 6 months ago

If you find that OCR doesn’t get you very far, maybe try a small vLM to parse PNGs of the pages. For example, Nanonets OCR will do this, although quite slow if you don’t have a GPU. It will give you a Markdown version of the page, which you can then translate with another tool.

PaddleOCR might also be useful, since it focuses on Chinese, but it’s more difficult to set up. To add to this, some other options are MinerU and MistralOCR (this is paid, but you can test it for free if you upload it in Mistral’s library).

andrew0@lemmy.dbzer0.com · 11 months ago

Is this artist involved with Obojima by any chance?

andrew0@lemmy.dbzer0.com · 1 year ago

Archiving papers using Zotero headless?

andrew0

Archiving papers using Zotero headless?

Archiving papers using Zotero headless?