Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentiment analysis #13

Open
aserebrenik opened this issue Oct 14, 2020 · 1 comment
Open

Sentiment analysis #13

aserebrenik opened this issue Oct 14, 2020 · 1 comment

Comments

@aserebrenik
Copy link

researchers are starting to collate software engineering specific training data _1134_

Recently several articles and corresponding datasets have been released in this field:

  • N. Novielli, F. Calefato, D. Dongiovanni, D. Girardi, F. Lanubile. “Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?“. In Proceedings of the 17th International Conference on Mining Software Repositories (MSR ’20), October 5-6, 2020 – DOI: https://doi.org/10.1145/3379597.3387446. Dataset
  • D. Gachechiladze, F. Lanubile, N. Novielli, A. Serebrenik “Anger and Its Direction in Collaborative Software Development.” In Proceedings of ICSE 2017, the IEEE/ACM 39th Interational Conference on Software Engineering: New Ideas and Emerging Results (NIER) Track , pp.11-14, 2017. Dataset

Related work discusses confusion:

  • F. Ebert, F. Castor, N. Novielli, A. Serebrenik, “Confusion Detection in Code Reviews” In Proceedings of the 33rd International Conference on Software Maintenance and Evolution – New Ideas and Emerging Results track (ICSME-NIER) 2017. Dataset
@Derek-Jones
Copy link
Owner

Thanks for posting details of these papers, and particular the datasets. I'm always keen to see new data.

My experience with analysis of natural language is that it is very hard to anything meaningful. It takes a lot of work to do even the simplest of tasks.

I was once excited by the possibility of sentiment analysis, then I tried to use the predictions (in a non-software engineering context). I relearned how important word order is to meaning, and that what people mean can be the opposite of the words they use. Building a software engineering sentiment analysis dataset is one thing, doing anything useful with it is another.

"Anger and its direction": I once did some work for a charity trying to extract information on torture. The main problem we had was trying to figure out whether somebody was talking about being tortured or talking about others doing torture (e.g., torture was a bad thing). Again the problem was context, and the fact that the tools we had were too primitive (and those of us involved not being experts in linguistic processing).

"Confusion detection": The more interesting dataset would be the initial confusion/nonconfucion classification made by each annotator. This might be used to get some idea of what level of confusion exists about software senetences. If this exercise was rerun, I would expect the gold set produced to be very different from the one produced by this work. People really are very different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants