ISBN, OCR, in bulk/batches
ConversesWelcome to LibraryThing!
Afegeix-te a LibraryThing per participar.
1RABateman
Hello!
I am new to LT and I am loving it. I also work as a Library Tech IRL, but not in a cataloguing capacity.
I used the app's to catalogue ~5000 books using the Barcode Scanner.
The next stage of my cataloguing mission will be to add books without barcodes, using their text-only ISBN
I am trying to find a way to streamline this process using OCR and figured that someone on here must have advice!
I have tried the ISBN Scan app but it does not work for me at all.
I would love to find a solution that would allow me take multiple pics of text-only ISBNs, OCR them, and dump them in batches. Most of the OCR apps that I have found will OCR only a single photos's text, and then save that single line of text in other formats.
The best that I have come across has been using a OCR app to copy and paste (book by book) into a the GoogleSheets app.
Has anyone on LT mastered streamlining this process??!
Thanks so much!
Chris
I am new to LT and I am loving it. I also work as a Library Tech IRL, but not in a cataloguing capacity.
I used the app's to catalogue ~5000 books using the Barcode Scanner.
The next stage of my cataloguing mission will be to add books without barcodes, using their text-only ISBN
I am trying to find a way to streamline this process using OCR and figured that someone on here must have advice!
I have tried the ISBN Scan app but it does not work for me at all.
I would love to find a solution that would allow me take multiple pics of text-only ISBNs, OCR them, and dump them in batches. Most of the OCR apps that I have found will OCR only a single photos's text, and then save that single line of text in other formats.
The best that I have come across has been using a OCR app to copy and paste (book by book) into a the GoogleSheets app.
Has anyone on LT mastered streamlining this process??!
Thanks so much!
Chris
2RABateman
Hi All!
Here's an update on the process that I have developed for inputting bulk batches of Text-only ISBNS (no barcodes)
1) I take a bunch of photos of ISBN numbers from the backs of books or on the Book's verso.
2) I export a batch of photos (anywhere from 10 to 200 usually) as a single PDF file.
3) I then import the 10-200 page PDF into a Program Called OWLOCR. (I tested a number of software/apps and OWLOCR had some of the best OCR capabilities)
4) After OWLOCR has "read" all the text, I click "Copy All text" and paste the text into a simple text editor
5) in the texteditor, I manually scan (with my eyes) the document, deleting anything that is NOT an ISBN number.
6) at the end of step 5, I am left with a .txt file with 10-200 lines of text.
7) i copy all 10-200 lines into a spreadsheet program. Here, I confirm that the amount of rows (lines) is equal to the amount of PDF pages that I imported to OWLOCR. If I have less rows, that means that I accidentally deleted an ISBN in step 5. If there are more rows than amount of photos, that means I included something that wasn't an ISBN. OWLOCR makes it easy to navigate through the PDFs to compare.
8) In the spreadsheet software, I use the Find/Replace function to clean up the data as much as possible. I delete dashes, I make sure that the OCR hasn't read a "1" and an "I", etc...
9) I then export the list of 10-200 rows as a .CSV file.
10) then i use LibraryThing's universal input tool to input the 10-200 lines from the .CSV file into my collection.
11) best case scenario: LibraryThing recognizes the same amount of books that you had in your .CSV file. Then I am finished!
12)....If that does not happen (for example, if you uploaded 150 lines and LibraryThing only recognized 100 books), I wait until the upload is completed and then I go back to my Spreadsheet!
13) back in the spreadsheet from steps 7: In column A is the list of ISBNs that I have been working with. In Column C (for example) I paste the incomplete list of ISBNs that LibraryThing imported into my collection (there are a few diff ways to get that list, but I prefer to simply copy the list from my YourBooks page)
14) Then I use a spreadsheet formula to compare the list of ISBNs in Column C against the list of ISBNs in Column A. I use Googlesheets as a spreadsheet software. If anyone cares or is interested, I can explain the formula in a Private message/email.
15) After I determine which ISBNs from column A were omitted from the import, I do a bit of detective work to determine why! Sometimes, I just need to enter the ISBN manually and that works. Other times, I find out that OWLOCR mis"read" the numbers. In rarer cases, I have to go back to the shelf and find the book whose ISBN was not imported and I have to add it manually.
I am sure there are some shortcuts that could be made here, but without any python/hacking skills, this is what I have come up with.
Hit me up if you have any questions/comments.
:)
Here's an update on the process that I have developed for inputting bulk batches of Text-only ISBNS (no barcodes)
1) I take a bunch of photos of ISBN numbers from the backs of books or on the Book's verso.
2) I export a batch of photos (anywhere from 10 to 200 usually) as a single PDF file.
3) I then import the 10-200 page PDF into a Program Called OWLOCR. (I tested a number of software/apps and OWLOCR had some of the best OCR capabilities)
4) After OWLOCR has "read" all the text, I click "Copy All text" and paste the text into a simple text editor
5) in the texteditor, I manually scan (with my eyes) the document, deleting anything that is NOT an ISBN number.
6) at the end of step 5, I am left with a .txt file with 10-200 lines of text.
7) i copy all 10-200 lines into a spreadsheet program. Here, I confirm that the amount of rows (lines) is equal to the amount of PDF pages that I imported to OWLOCR. If I have less rows, that means that I accidentally deleted an ISBN in step 5. If there are more rows than amount of photos, that means I included something that wasn't an ISBN. OWLOCR makes it easy to navigate through the PDFs to compare.
8) In the spreadsheet software, I use the Find/Replace function to clean up the data as much as possible. I delete dashes, I make sure that the OCR hasn't read a "1" and an "I", etc...
9) I then export the list of 10-200 rows as a .CSV file.
10) then i use LibraryThing's universal input tool to input the 10-200 lines from the .CSV file into my collection.
11) best case scenario: LibraryThing recognizes the same amount of books that you had in your .CSV file. Then I am finished!
12)....If that does not happen (for example, if you uploaded 150 lines and LibraryThing only recognized 100 books), I wait until the upload is completed and then I go back to my Spreadsheet!
13) back in the spreadsheet from steps 7: In column A is the list of ISBNs that I have been working with. In Column C (for example) I paste the incomplete list of ISBNs that LibraryThing imported into my collection (there are a few diff ways to get that list, but I prefer to simply copy the list from my YourBooks page)
14) Then I use a spreadsheet formula to compare the list of ISBNs in Column C against the list of ISBNs in Column A. I use Googlesheets as a spreadsheet software. If anyone cares or is interested, I can explain the formula in a Private message/email.
15) After I determine which ISBNs from column A were omitted from the import, I do a bit of detective work to determine why! Sometimes, I just need to enter the ISBN manually and that works. Other times, I find out that OWLOCR mis"read" the numbers. In rarer cases, I have to go back to the shelf and find the book whose ISBN was not imported and I have to add it manually.
I am sure there are some shortcuts that could be made here, but without any python/hacking skills, this is what I have come up with.
Hit me up if you have any questions/comments.
:)
3gilroy
You could just put the ISBN into an excel file and upload it. That allows it to use the sources and tends to be much faster...
5RABateman
Librarything does not seem to support uploading .xlsx files. Did i misinterpret your suggestion?
6jjwilson61
Try saving the file as a csv file.