-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent use of slash / solidus #11
Comments
Good catch. This must, I am afraid, be handled on the basis of individual languages. There are two ways to proceed:
The annotation procedure would be that one resolves the slash. E.g., if it is a morpheme boundary, replace by a |
Yes, knowledge of the languages is required in order to know what the slash is supposed to mean in each instance. I don't think there are too too many cases of slashes throughout, so I think it might be best to make a lit of all forms with slashes and go through them all. |
Okay, @SimonGreenhill, would you like to have a look, or should I check about slashes, once I find time? |
yes, these were things I couldn't automatically split (some are reduplications, some are variant morphemes/reconstructions)... and note that it's not just the forms that need splitting, but the cognate set memberships which is more tricky. So I left them alone, sorry :/ N = 868 Here are all the entries in the database like this. Best to fix them directly there, so perhaps we could
Possibly @maryewal can have a quick look at the forms in the Polynesian languages and see if they need splitting (e.g. Nukumanu yes? Mooriori no?) |
Yep, no worries.
|
yep, got it. same issue we had with them a while back (not with numeralbank, but with exporting the data). |
ok, so shall I replace the / in the pollex entries with a dash - or pipe | ? (E.g. Tokelau, Kiribati, Rapanui, Uveas etc) Are the Rapa, mooriori and Nuguria forms in the same boat? |
@SimonGreenhill I think the best thing is for me to manually correct some of these and then I'll send you a list where splits need to occur. What do we want to do about the incl./excl pronouns? I fear that will be a mess to deal with throughout... |
Cool, thanks. Possibly the easiest way is to edit the large table I pasted above to change / to something like ", " (because then then I can automatically split them (e.g. was "tun/tin", now "tun, tin":
(I can make this a CSV or spreadsheet if that's easier for you). (the same ", " system works for annotations and cognacy too, e.g. "a|b" -> "a, b") Alternatively if you can just say things like "all entries in Nuguria should be split/changed to |/whatever", then I can do that quickly directly in the database. Re incl/excl. Yeah, these are a pain -- I wish I'd had the foresight to split them into two different words when we started. Can we annotate them? I don't want to make too much work for you! |
No worries, sounds good and I can work with the table - easy enough. Re incl/excl, it's a good idea to annotate. Happy to get this cleaned up and cldf-compatible(ish)! might not get to it until end of the week, but not difficult. |
Thanks Mary. Longer term we should sit down and discuss ABVD2.0, and one thing to do there would be to split out incl and excl forms into their own words, and sort out the mess that is you sg/pl |
Hello! I was just wondering what the status of these pesky slashes is. Has there been an update somewhere that deals with them? (Also, where do the most current data files live?) Thanks! |
No, I've fixed some but left the ones I can't tell for sure. If people can tell me how to fix them I can fix them @maryewal looking at the discussion above you suggest that a lot of the PN '-''s are from Pollex. Should I replace these with - e.g. rua/ki -> "rua-ki" or "ruaki". That would clear a lot of the mess. |
@barlowrussell -- the data files are still the ABVD server database, this repository here is generated for them (so when I say "fixed" they're fixed on abvd.eva.mpg.de and this will percolate here when the lexibank dataset gets rebuilt) |
@SimonGreenhill yes, let's go with removing them entirely. the morpheme boundary decisions aren't always clear so better to remove them altogether and I can reassess as needed. |
It seems like the forward slash is mainly used to indicate alternate forms for a given concept, but it also creeps up in other places, perhaps to indicate morpheme boundaries (?), as in 'twenty' and 'fifty' in Malagasy (Sakalava) [1184] and Malagasy (Tandroy) [1186]: <roa/pòlo> and <lima/m/pòlo>. A few words for 'vomit' also seem to have slashes, e.g., Rarotongan <rua/ki>. This of course results in the problematic interpretation that and are both forms for 'vomit', whereas there's really just one form /ruaki/.
The text was updated successfully, but these errors were encountered: