ISBN, OCR, in bulk/batches

ConversesWelcome to LibraryThing!

Afegeix-te a LibraryThing per participar.

ISBN, OCR, in bulk/batches

1RABateman
feb. 26, 2020, 3:12 pm

Hello!

I am new to LT and I am loving it. I also work as a Library Tech IRL, but not in a cataloguing capacity.

I used the app's to catalogue ~5000 books using the Barcode Scanner.

The next stage of my cataloguing mission will be to add books without barcodes, using their text-only ISBN

I am trying to find a way to streamline this process using OCR and figured that someone on here must have advice!

I have tried the ISBN Scan app but it does not work for me at all.

I would love to find a solution that would allow me take multiple pics of text-only ISBNs, OCR them, and dump them in batches. Most of the OCR apps that I have found will OCR only a single photos's text, and then save that single line of text in other formats.

The best that I have come across has been using a OCR app to copy and paste (book by book) into a the GoogleSheets app.

Has anyone on LT mastered streamlining this process??!

Thanks so much!

Chris

2RABateman
març 29, 2020, 3:36 pm

Hi All!

Here's an update on the process that I have developed for inputting bulk batches of Text-only ISBNS (no barcodes)

1) I take a bunch of photos of ISBN numbers from the backs of books or on the Book's verso.
2) I export a batch of photos (anywhere from 10 to 200 usually) as a single PDF file.
3) I then import the 10-200 page PDF into a Program Called OWLOCR. (I tested a number of software/apps and OWLOCR had some of the best OCR capabilities)
4) After OWLOCR has "read" all the text, I click "Copy All text" and paste the text into a simple text editor
5) in the texteditor, I manually scan (with my eyes) the document, deleting anything that is NOT an ISBN number.
6) at the end of step 5, I am left with a .txt file with 10-200 lines of text.
7) i copy all 10-200 lines into a spreadsheet program. Here, I confirm that the amount of rows (lines) is equal to the amount of PDF pages that I imported to OWLOCR. If I have less rows, that means that I accidentally deleted an ISBN in step 5. If there are more rows than amount of photos, that means I included something that wasn't an ISBN. OWLOCR makes it easy to navigate through the PDFs to compare.
8) In the spreadsheet software, I use the Find/Replace function to clean up the data as much as possible. I delete dashes, I make sure that the OCR hasn't read a "1" and an "I", etc...
9) I then export the list of 10-200 rows as a .CSV file.
10) then i use LibraryThing's universal input tool to input the 10-200 lines from the .CSV file into my collection.
11) best case scenario: LibraryThing recognizes the same amount of books that you had in your .CSV file. Then I am finished!
12)....If that does not happen (for example, if you uploaded 150 lines and LibraryThing only recognized 100 books), I wait until the upload is completed and then I go back to my Spreadsheet!
13) back in the spreadsheet from steps 7: In column A is the list of ISBNs that I have been working with. In Column C (for example) I paste the incomplete list of ISBNs that LibraryThing imported into my collection (there are a few diff ways to get that list, but I prefer to simply copy the list from my YourBooks page)
14) Then I use a spreadsheet formula to compare the list of ISBNs in Column C against the list of ISBNs in Column A. I use Googlesheets as a spreadsheet software. If anyone cares or is interested, I can explain the formula in a Private message/email.
15) After I determine which ISBNs from column A were omitted from the import, I do a bit of detective work to determine why! Sometimes, I just need to enter the ISBN manually and that works. Other times, I find out that OWLOCR mis"read" the numbers. In rarer cases, I have to go back to the shelf and find the book whose ISBN was not imported and I have to add it manually.

I am sure there are some shortcuts that could be made here, but without any python/hacking skills, this is what I have come up with.

Hit me up if you have any questions/comments.

:)

3gilroy
març 29, 2020, 5:08 pm

You could just put the ISBN into an excel file and upload it. That allows it to use the sources and tends to be much faster...

4RABateman
març 29, 2020, 6:07 pm

you mean a .xlsx file specifically? Good to know!

5RABateman
Editat: abr. 7, 2020, 6:12 pm

Librarything does not seem to support uploading .xlsx files. Did i misinterpret your suggestion?

6jjwilson61
abr. 7, 2020, 10:35 pm

Try saving the file as a csv file.

7RABateman
abr. 9, 2020, 4:39 pm

that is what i do (step 9 in the 2nd post)