Pages

Sunday, November 14, 2021

Java - How To Compare Two Strings Lexicographically | String compareTo method works internally

Java - How To Compare Two Strings Lexicographically


Today, we will learn about comparing two strings lexicographically in Java. Many of you might be aware of how to compare two strings in java but not in a manner lexicographically. Lexicographically word looks somewhat new and weird. Do not worry, we will go step by step easily understandable way and how it works internally.

First, We will go through examples of how to compare two string values are the same/equal or not.

Compare Two Strings Lexicographicall



String equals example:

package blog.java.w3schools.string.compare.lexicographically;

public class StringCompare {
 public static void main(String[] args) {
  String stringOne = "w3schools";
  String stringTwo = "java";
  String stringThree = "w3schools";

  System.out.println("Comparing stringOne and stringTwo : " + stringOne.equals(stringTwo));
  System.out.println("Comparing stringOne and stringThree : " + stringOne.equals(stringThree));
 }
}

Output:
Comparing stringOne and stringTwo : false
Comparing stringOne and stringThree : true


String class in java has a method called equals which takes input parameter Object type. It compares the contents of two strings are same. It returns true if and only if the passed value is not null value and contents are same, else false.


In our example program, 

stringOne and stringTwo values are different. So stringOne.equals(stringTwo) returns false.
stringOne and stringThree values are same. So stringOne.equals(stringThree) returns true.

Comparing two Strings lexicographically:

Comparing two strings lexicographically is done by calling compareTo method of String class which takes the method parameter type is String and it returns int type. The comparison is based on the Unicode value of each character in the strings.

For example, take two strings as below.

String s1 = "abcdf";
String s2 = "abceg";

Take the each index character value from these two strings and compare them. Assume k is index, if the difference of two numbers is zero then they are said to be same. This is correct. Saying they are same in different way than comparing is using == operator.

s1.charAt(k)-s2.charAt(k)

If the difference is zero for the same index in two string then will go for the next index. This process repeats untill it finds non zero difference or if difference is zero then minimum length of string1 and string 2. This comparison process is called as Lexicographically Comparison.

Strings Lexicographical Example Simulation


This works based on Unicode value. For easy understanding just like dictionary or alphabetical order.

ASCII code of a = 97, b = 98, c = 99, d = 100, e = 101

s1[0] - s2[0] ==> a - a => 97 - 97 ==> 0
s1[1] - s2[1] ==> b - b => 98 - 98 ==> 0
s1[2] - s2[2] ==> c - c => 99 - 99 ==> 0
s1[3] - s2[3] ==> d - e => 100 - 101 ==> -1


It stops comparing at this point. If diff is non zero then return the difference. It will not check for remaining indexes even though same or not.

Syntax: public int compareTo(String anotherString)

ComareTo method internal code


This code is from java String api class internal code for better visibility how it has been implemented.
Try to understand code below.

public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
}



String class has char array variable named value. Lexicographically means comparing two strings character by character and when it mismatches then it return difference of two characters.

Read more about How to make a class Immutable in java.

CompareTo Step By Step internal Explanation


How String compareTo method works internally in java.

String s1 = "abcd";
String s2 = "abce";

Step 1:
CompareTo method works always on two strings.
s1.compareTo(s2);

Step 2:
If any one of String is null then will throw runtime exception saying NullPointerException.
Exception in thread "main" java.lang.NullPointerException
  at java.lang.String.compareTo(Unknown Source)
  at blog.java.w3schools.string.compare.lexicographically.StringCompareLexicographically.main(StringCompareLexicographically.java:10)

Step 3:
String class has a instance variable that is named "value" which is char[] array.

Step 4:
Get the length of two strings.
 int len1 = s1.value.length;
        int len2 = s2.value.length;

Step 5:
Get the minimum length from len1, len2. This is said to be limit for the comparison.
int lim = Math.min(len1, len2);

Step 6:
Get two arrays.
 char v1[] = s1.value;
        char v2[] = s2.value;

Step 7:
loop from index 0 to lim for both strings s1, s2.

Step 8:
Compare each index value. If not equal then return diff.

Step 9:
If both strings lengths are same then comparison should be done till last index or till not same.

Step 10:
If lenghts are not same then compare the min lenght index and if both string are same till min length then return diff len1 - len2.
  String s1 = "pqrs";
  String s2 = "pqrstuv";
  
  lenght = min (s1 length, s2 length) ==> (4, 3) ==> first four characters are same ==> then compare len1 - len2 ==> 4-7 ==> -3


Compare two String Lexicographically Example

package blog.java.w3schools.string.compare.lexicographically;

public class StringCompareLexicographically {

 public static void main(String[] args) {

  String s1 = "abcd";
  String s2 = "abce";

  int diff = s1.compareTo(s2);
  System.out.println("Compare Lexicographically diff : " + diff);
 }
}

output:
Compare Lexicographically diff : -1

Summary: 

 

Comparing two strings lexicographically plays a significant role in string sorting techniques in java. lexicographically means comparing each character in both strings index by index. It uses unicode of character (ASCII). If all characters are the same then it returns 0. If any mismatch is found then returns the difference.

Please leave your understanding, Any comments, Questions in the comments box.


No comments:

Post a Comment

Please do not add any spam links in the comments section.