Exciting things

Efficient Java way to search string in large text

Today, we faced to a technical question when giving solution for a prospect. We need to do a background processing in Java for below reqs:

+ Getting some text data from social networks.

+ Doing some string search for counting the keywords (in a pre-defined list) before giving some final results.

Certainly, the first quick-dirty answer for this is: “using String indexOf function”. We all know that is not good for performance. 

We did a quick research on this topic. Below is the list of solution we’re analyzing:

+ Using Full Text search. 

+ Change to another string search algorithm. 

We still don’t decide the final solution yet. However, below is the list of string search algorithm we find:


Below is the benchmark chart of these algorithm:

benchmark

You can see  more detail about the benchmark on the article here.

Boyer More/BNDM are prefer ones. However, the String indexOf implementation in Java is used with Brute Force algorithm. 

We also find Boyer More/BNDM implementation on this.

Anyway, we need to do some tests before giving the final answer.

Leave a Reply

Your email address will not be published. Required fields are marked *