Question

I have a string which basically contains a paragraph. There might be line breaks. Now I would want to get only the 1st sentence in the string. I thought I would try

indexOf(". ") 

that is a dot with a space.

The problem is that this won't work though on a line such as firstName. LastName.

I'm using .Net. Is there a good method available to achieve this? Im also tagging Java to see if I can narrow down my search.

Was it helpful?

Solution

What you need is a Natural Language Parsing (NLP) toolkit. It's very hard to write one yourself, as it requires a lot of research and data collection, but luckily it has already been done for you.

.NET

SharpNLP is a collection of natural language processing tools written in C#. Currently it provides the following NLP tools:

  • a sentence splitter
  • ...

Java

OTHER TIPS

You need to somehow mark the end of a sentence. As you already noted a "." isn't doing that since it can be used differently ("Hi, my name is Mr. Pudelhund."). If possible I would recommend using some sign that won't be used.

Edit: The other method is good as well, but way more complicated. If you can't edit the string you are using though, that method beats mine ;)

This can be with use very simple implementation with String.substring()

String example = "Hello world. This is example. " ;
System.out.print(example.substring(0, example.indexOf(".")+1)); // --> Hello world.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top