Slightly better languages

ConversesNew features

Afegeix-te a LibraryThing per participar.

Slightly better languages

Aquest tema està marcat com "inactiu": L'últim missatge és de fa més de 90 dies. Podeu revifar-lo enviant una resposta.

1timspalding
Editat: ag. 1, 2018, 5:41 pm

Short version: LT cataloging is getting a little better at languages.

Longer version: LibraryThing's language system is based on the MARC Language Codes ( http://www.loc.gov/marc/languages/language_name.html ), a long-established system of languages that rules the roost in library records.

(FWIW: There are many good arguments to change the MARC codes here and there, or even utterly, but we are committed to the system, which undergirds the best book data out there. There is value to using a system, even an imperfect one, and there's no way to change it that's fully backwards-compatible, not to mention

That said, our Amazon book data, and our soon-to-come ProQuest/Bowker book data, gives us languages by name, which we have to map to MARC language codes. Mostly they have mapped great, but Amazon doesn't enforce a list of languages, and there's clearly been some drift over time. It's also become more common for languages names to be expressed in another language. (Queue headache.) So I've added more mapping variants. Here are some recent ones that appeared, and their new mapping to MARC.

Simplified Chinese (chi2)
Russisch (rus)
Niederländisch (dut)
Englisch (eng)
Español (spa)
None (zxx)
French, Middle (frm)
Gaelic (gle)
Greek, Ancient (to 1453) (grc)
Anglais (eng)
Middle English (enm)
Multilingual (mul)
Mandarin Chinese (chi)
Castilian (spa)
Deutsch (ger)
Schwedisch (swe)
Portuguese Brazilian (por2)
Unknown (und)
Français (fre)
Slavic (chu)
German, Middle High (gmh)
Acholi (ach)
Romance (roa)
French, Old (fro)
English, Middle (enm)
Germanic (gem)
Flemish (dut)
Greek, Classical (grc)
Persian, Modern (per)

You'll note chi1, chi2 and por1, por2 (Traditional Chinese, Simplified Chinese, Portuguese Portuguese and Brazilian Portuguese). They are the only deviations from the MARC list I've ever implemented.

As with other changes, we're not making it retroactive, but it's something to consider as an option for the future.

2Collectorator
ag. 1, 2018, 5:49 pm

Aquest membre ha estat suspès.

3elenchus
ag. 1, 2018, 10:14 pm

>1 timspalding: and our soon-to-come ProQuest/Bowker book data,

Do tell?!

4lorax
ag. 2, 2018, 9:51 am

Are the character-set encoding issues like

"Español (spa)"

actually reflected in the data - that is, they'll recognize that ñ sometimes gets miscoded as ñ and map either version to "spa" - or is that a cut-and-paste issue?

5lorannen
Editat: ag. 2, 2018, 11:13 am

>3 elenchus: Shhh, it's a secret! (Not really—just not ready quite yet)

>4 lorax: Good question. I think it's the latter, but I'll ask Tim to confirm.

6prosfilaes
ag. 2, 2018, 12:58 pm

Can we get an update to the most recent MARC data? Klingon and Lojban have been added to the list since LibraryThing imported it. (They're in that list you link to.)

7prosfilaes
ag. 2, 2018, 1:20 pm

chi1, chi2 and por1, por2 would be more standard as chi-Hant, chi-Hans, por-PT, por-BR. RFC 5646 specifies two letter codes for Chinese and Portuguese, but if you're already using three from MARC, it's still be more standard to add the tags.

Script subtags (ISO 15924) might be useful for more than just Chinese Traditional versus Simplified. Chinese can also be written in Latin (-Latn), Bopomofo (-Bopo), or Han with Bopomofo (-Hanb). The codes let you distinguish Fraktur (-Latf) from normal Latin (-Latn), and the various scripts Sanskrit is written in, be it Devanagari (-Deva), Brahmi (-Brah), Latin (-Latn) or any number of others.

8timspalding
ag. 3, 2018, 2:40 am

>4 lorax:

No, sorry, it's the encoding of the display, not the data.