Pages

Tuesday, March 10, 2020

Java Program To Count Duplicate Characters In String (+Java 8 Program)

1. Introduction


In this article, We'll learn how to find the duplicate characters in a string using a java program. This java program can be done using many ways. But, we will focus on using the Brute-force search approach, HashMap or LinkedHashMap, Java 8 compute() and Java 8 functional style.

Time complexity will be O(n) for brute force, for others will be O(nlogn).

At last, But important to know how to handle the Unicode characters. For this problem statement solution, New Java 8 String api provided the solution with the codepoints concepts.

Let us start with the HashMap first.

Coming to the problem statement, We have to find how many times each is repeated in the input string. The input may contain special characters such as '!', '#', '@', '$'.. etc. All characters must be handled by the algorithm.

Java Program To Count Duplicate Characters In String (+Java 8 Program)




Input :

String str = "aaaabbbcccccdddddd";

Output:

a  4
b  3
c  5
d  6

2. Using a Brute-force attack


This is the simplest solution and easy to understand. We run two for loops. Here the first loop takes each character from the input string and compares with each character in the entire string in the second loop. If it finds any match then the count will be incremented by 1. The same process will be repeated for each char in the string.

Here if any character appears more than once, then it will print the same char in output twice with the same count. To avoid the same character processing again and printing in the output, Implemented another utility method named "isProcessedChar" that takes char as input and returns boolean true if already processes otherwise false.

package com.javaprogramto.w3schools.engineering.string;

import java.util.ArrayList;
import java.util.List;

/**
 *
 *
 * @version JavaProgramTo.com
 */
public class StringCountDuplicateCharactersBruteForce {

    private static List chars = new ArrayList();

    public static void main(String[] args) {
        System.out.println("---------Input 1 ------");
        printCountOfDuplicateCharacter("JavaProgramTo.com");
        System.out.println("---------Input 2 ------");
        printCountOfDuplicateCharacter("aaaabbbcccccdddddd");

    }

    // print count of each character in a given string.
    private static void printCountOfDuplicateCharacter(String input) {

        for (int i = 0; i < input.length(); i++) {
            char ch = input.charAt(i);

            if (isProcessedChar(ch))
                continue;

            int count = 0;
            for (int j = 0; j < input.length(); j++) {
                if (ch == input.charAt(j)) {
                    count++;
                }
            }
            System.out.println(ch + " : " + count);
        }
        chars.clear();
    }

    // checking whether the given char is already processed or not.
    private static boolean isProcessedChar(char ch) {

        if (chars.contains(Character.toString(ch))) {
            return true;
        } else {
            chars.add(Character.toString(ch));
        }
        return false;
    }

}

Output:

---------Input 1 ------
J : 1
a : 3
v : 1
P : 1
r : 2
o : 3
g : 1
m : 2
T : 1
. : 1
c : 1
---------Input 2 ------
a : 4
b : 3
c : 5
d : 6

3. Using HashMap or LinkedHashMap


HashMap takes a key-value pair and here our case, the key will be character and value will be the count of char as an integer. first, we will take a character from string and place the current char as key and value will be 1 in the map. Next, take the second character. If the char is already present in the map using containsKey() method, then simply increase the current value by 1. Repeat the same for each character.

At end print the map that holds occurrences of each character.

package com.javaprogramto.w3schools.engineering.string;

import java.util.LinkedHashMap;
import java.util.Map;

/**
 *
 *
 * @version JavaProgramTo.com
 */
public class StringCountDuplicateCharactersHashMap {

    public static void main(String[] args) {
        printCountOfDuplicateCharacterUsingHashMap("bbbcccccddddddaaaa");
        printCountOfDuplicateCharacterUsingHashMap("##^$!%^$!^%@!$^@!kds");

    }

    // Using hashmap : print count of each character in a given string.
    private static void printCountOfDuplicateCharacterUsingHashMap(String input) {

        Map<Character, Integer> output = new LinkedHashMap<Character, Integer>();
        for (int i = 0; i < input.length(); i++) {
            char ch = input.charAt(i);
            if (output.containsKey(ch)) {
                output.put(ch, output.get(ch) + 1);
            } else {
                output.put(ch, 1);
            }

        }

        System.out.println(output);

    }

}

Output:

In the above program, we used LinkedHashMap instead of HashMap because LinkedHashMap preserves the order of how the characters are added to the map.

The below output using LinkedHashMap.

{b=3, c=5, d=6, a=4}
{#=2, ^=4, $=3, !=4, %=2, @=2, k=1, d=1, s=1}

Change the map implementation to HashMap.

Map<Character, Integer> output = new HashMap<Character, Integer>();

HashMap Output:

{a=4, b=3, c=5, d=6}
{@=2, !=4, #=2, s=1, $=3, d=1, %=2, k=1, ^=4}

Compares both Map's output and understands how they hold the data.

4. Using Java 8 Map compute() method


Map implementations are introduced with a new method compute() in java 8 api. Now let us use the compute() method and simplify the logic of the above program.

compute() method takes key and BiFunction as arguments. BiFunction takes a key, value and returns value type value.

See the below program on how the code is reduced to a single line inside for loop using compute().

java 8 compute program:

package com.javaprogramto.w3schools.engineering.string;

import java.util.LinkedHashMap;
import java.util.Map;

/**
 * Java 8 map compute() method example.
 *
 * @version JavaProgramTo.com
 */
public class StringCountDuplicateCharJava8Compute {

    public static void main(String[] args) {
        printCountOfDuplicateCharMapCompute("bbbcccccddddddaaaa");
        printCountOfDuplicateCharMapCompute("##^$!%^$!^%@!$^@!kds");

    }

    // Using hashmap : print count of each character in a given string.
    private static void printCountOfDuplicateCharMapCompute(String input) {

        Map<Character, Integer> output = new LinkedHashMap<Character, Integer>();
        for (int i = 0; i < input.length(); i++) {
            char ch = input.charAt(i);

            output.compute(ch, (key, value) -> (value == null) ? 1 : value + 1);

        }

        System.out.println(output);

    }

}

Output:

{b=3, c=5, d=6, a=4}
{#=2, ^=4, $=3, !=4, %=2, @=2, k=1, d=1, s=1}


5. Java 8 Stream API


Now, We will see how to achieve the problem using java 8 stream api concepts and its methods.

This is a three-step process. The first two steps are to transform the input string into the Stream of characters. The final step to group and count the characters.

A) Invoke the chars() method on the input string and which returns the IntStream instance. This int stream holds the integer representation of each character in the string.
B) Need to convert IntStream to Char stream using the mapToObj() method.
C) Last, need to group by characters by calling Collectors.groupingBy() and to count call Collectors.counting() method.

The last step returns a map that stores the character and its count.

Java 8 Stream Collectors program:

package com.javaprogramto.w3schools.engineering.string;

import java.util.Map;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
import java.util.stream.Stream;

/**
 * Java 8 Stream api collectors example
 *
 * @version JavaProgramTo.com
 */
public class StringCountDuplicateCharJava8Streams {

    public static void main(String[] args) {
        printCountOfDuplicateCharJava8Stream("bbbcccccddddddaaaa");
        printCountOfDuplicateCharJava8Stream("##^$!%^$!^%@!$^@!kds");

    }

    // Using hashmap : print count of each character in a given string.
    private static void printCountOfDuplicateCharJava8Stream(String input) {

        // Step 1
        IntStream intStream = input.chars();

        // Step 2
        Stream<Character> charsStream = intStream.mapToObj(ch -> (char) ch);

        // Step 3
        Map<Character, Long> output = charsStream.collect(Collectors.groupingBy(ch -> ch, Collectors.counting()));

        System.out.println(output);

    }

}

Output:

{b=3, c=5, d=6, a=4}
{#=2, ^=4, $=3, !=4, %=2, @=2, k=1, d=1, s=1}

This program also produced the same output and all 3 steps can be rewritten into a single line as below.

Map<Character, Long> output =  input.chars().mapToObj( ch -> (char) ch ).collect(Collectors.groupingBy(ch -> ch, Collectors.counting()));

6. Working with CodePoints (UTF - Unicode 32-bit values)


Java provides supports for a wide range of characters set within range till 65,535 (0xFFFF). But, later on, lots of other characters introduced and added to the UTF character set. Maximum value has reached 1,114,111 (0x10FFFF).

Typically, We know ASCII characters and those are unprintable control codes between 0-31, printable characters between 32-127, and extended ASCII codes between 128-255.
Java stores the values in 16-bit till 65,535 and all-new characters till 1,114,111 (0x10FFFF) are fall into 32-bit (UTF 32 encoding schema). Java does not support the UTF 32.

You may have a question about what are the new characters added. If your string has characters such as '💕' or '💌' or '💔'. All of these are fall into UTF 32.

What about the remaining characters till 1,114,111 (0x10FFFF).
Java has come up with the solution 'Surrogate Pair' which uses a 2 16-bits approach.

16-bit high surrogates: 1,024 values (U+D800 to U+DBFF)
16-bit low surrogates: 1,024 values (U+DC00 to U+DFFF)

Always a high surrogate followed by a low surrogate and this is known as a surrogate pair. Surrogate pairs are used to represent values between 65,536 (0x10000) and 1,114,111 (0x10FFFF).

Java 8 string api has new methods such as codePointAt(), codePoints(), codePointCount(), and offsetByCodePoints(). We have to use codePointAt() instead of charAt() and codepoints() instead of chars() methods.


String str = String.valueOf(Character.toChars(128140));
System.out.println("128140 : " + str);

str = String.valueOf(Character.toChars(128148));
System.out.println("128148 : " + str);

str = String.valueOf(Character.toChars(128149));
System.out.println("128149 : " + str);


System.out.println("str length : " + str.length());
System.out.println("str codepoints : " + str.codePoints().count());

System.out.println("Code point at index 0 : "+str.codePointAt(0));
System.out.println("Code point at index 1 : "+str.codePointAt(1));

Output:

128140 : 💌
128148 : 💔
128149 : 💕
str length : 2
str codepoints : 1
Code point at index 0 : 128149
Code point at index 1 : 56469

From the above program, codepoints count will be 1 even though string length is 2.

6.1 Surrogate Pair - Java 8 Codepoint Example using compute()


This code problem is implemented by using codepoints().

if the char is surrogate pair then it takes two chars in the input string. But it looks like the only character in the input. So, we need to increment the counter to skip the surrogate pair. For example, the symbol '💕' looks occupied the only character but it is not. It internally takes two chars but considered as one code point.

package com.javaprogramto.w3schools.engineering.string;

import java.util.HashMap;
import java.util.Map;

/**
 * Java 8 Stream api collectors example
 *
 * @version JavaProgramTo.com
 */
public class StringCountDuplicateCharJava8Surrogate {

    public static void main(String[] args) {
        Map<String, Integer> output1 = printCountOfDuplicateCharJava8Surrogate("hello 💌💕  email"); // 💌💕
        System.out.println("output 1 : " + output1);

        Map<String, Integer> output2 = printCountOfDuplicateCharJava8Surrogate("💌💕💌💕💌💕💌💕💌💕💌💕💌💕");
        System.out.println("output 2 : " + output2);
    }

    public static Map<String, Integer> printCountOfDuplicateCharJava8Surrogate(String input) {

        Map<String, Integer> result = new HashMap<>();

        for (int i = 0; i < input.length(); i++) {

            // using codePointAt() in place of charAt() method.
            int cp = input.codePointAt(i);

            // if the char is surrogate pair then it takes two chars in the input string.
            // But it looks like only character. So, we need to increment the counter to skip the surrogate pair
            if (Character.charCount(cp) == 2) { // 2 means a surrogate pair
                i++;
            }

            String ch = String.valueOf(Character.toChars(cp));

            result.compute(ch, (k, v) -> (v == null) ? 1 : ++v);
        }

        return result;
    }

}

Output:

output 1 : { =3, a=1, 💕=1, 💌=1, e=2, h=1, i=1, l=3, m=1, o=1}
output 2 : {💕=7, 💌=7}

6.2 Surrogate Pair - Java 8 Stream API and codepoints()


Let us write the same above code using java 8 functional programming style to cover the Unicode surrogate pairs.


public static Map<String, Integer> printCountOfDuplicateCharJava8Surrogate(String input) {

        Map<String, Long> result = input.codePoints().mapToObj(ch -> String.valueOf(Character.toChars(ch)))
                .collect(Collectors.groupingBy(ch -> ch, Collectors.counting()));

        return result;
    }

Complete for loop can be replaced with the above single line code.

7. Conclusion


In this article, We've seen java programs to count each character how many times repeated in the given string. Covered example code with two for loops which takes lots of time for execution. But, LinkedHashMap usage will be effective in performance-wise.

As well, Java 8 Map API introduced with compute() method which does the containsKey() logic internally and produces the count and functional style code.

At last covered, How to work with Unicode characters if the input contains special characters such as double heart, email symbol. Example programs on Surrogate pairs using java 8 standards.

Note: if you are not sure that input string contents characters if that has Unicode characters and not covered properly then string operations will produce the wrong result. Error proned result is difficult to debug and fix. The recommendation is to use the codepoints API methods as part of the String and Character class.

No comments:

Post a Comment

Please do not add any spam links in the comments section.