Pages

Thursday, February 21, 2019

String codePointAt() method in java with example - Internal Implementation

String codePointAt() in Java:

The codePointAt(int index) method of String class takes an index as a parameter and returns a character unicode point at that index in String contained by String or we can say charPointAt() method returns the “unicode number” of the character at that index. The index refers to char values (Unicode code units) and the value of index must be lie between 0 to length-1.

If the char value present at the given index lies in the high-surrogate range, the following index is less than the length of this sequence, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.


String codePointAt


Syntax:


public int codePointAt​(int index)



Return value: int

Note: Always index value should be in range between 0 and string length-1. If index is not in this range then will be thrown IndexOutOfBoundsException ( if the index argument is negative or not less than the length of this string). 

codePointAt() method Example 1:


Example program to finding the code point at the given index in a string.
package examples.java.w3schools.string;

public class StringcodePointAtExample {
 public static void main(String[] args) {

  String input = "VENKATESH";
  int value = input.codePointAt(4);
  System.out.println("Code point value at index 4 is "+value);
 }
}


Output:

Code point value at index 4 is 65



Char A is present in input string at index 4 and returned codepoint at that index is 65 which is a ASCII code of char A.



codePointAt() method Example 2:


Example program to get the code point for each character in the input string.

package examples.java.w3schools.string;

public class StringcodePointAtExample2 {
 public static void main(String[] args) {

  String input = "JAVA-W3SCHOOLS";

  for (int i = 0; i < input.length(); i++) {
   System.out.println("Code point value at index " + i + " is " + input.codePointAt(i));
  }
 }
}


Output:



Code point value at index 0 is 74
Code point value at index 1 is 65
Code point value at index 2 is 86
Code point value at index 3 is 65
Code point value at index 4 is 45
Code point value at index 5 is 87
Code point value at index 6 is 51
Code point value at index 7 is 83
Code point value at index 8 is 67
Code point value at index 9 is 72
Code point value at index 10 is 79
Code point value at index 11 is 79
Code point value at index 12 is 76
Code point value at index 13 is 83




codePointAt() method Example 3:




Example program to see if index is out of range (not in between 0 and length-1).


package examples.java.w3schools.string;

public class StringcodePointAtExample3 {
 public static void main(String[] args) {

  String input = "JAVA-W3SCHOOLS";
  input.codePointAt(input.length()+1);

 }
}



Output:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: index 15,length 14
 at java.base/java.lang.String.checkIndex(String.java:3278)
 at java.base/java.lang.String.codePointAt(String.java:723)
 at w3schools/examples.java.w3schools.string.StringcodePointAtExample3.main(StringcodePointAtExample3.java:7)


StringIndexOutOfBoundsException exception is thrown because we provided for index 15 which more that its length 15.

here is the place where the exception is thrown from internal String class api.

  static void checkIndex(int index, int length) {
        if (index < 0 || index >= length) {
            throw new StringIndexOutOfBoundsException("index " + index +
                                                      ",length " + length);
        }
    }


Internal Implementation code:


How codePointAt() method works internally and how it is implemented.
Below is the internal code from String class. Always remember, String class perform two checks on input string as below.

1) Latin checkIndex
2) UTF16 check
public int codePointAt(int index) {
        if (isLatin1()) {
            checkIndex(index, value.length);
            return value[index] & 0xff;
        }
        int length = value.length >> 1;
        checkIndex(index, length);
        return StringUTF16.codePointAt(value, index, length);
    }


Returns the character (Unicode code point) at the specified index. The index refers to char values (Unicode code units) and ranges from 0 to length() - 1.

If the char value specified at the given index is in the high-surrogate range, the following index is less than the length of this String, and the char value at the following index is in the low-surrogate range, then the supplementary code point corresponding to this surrogate pair is returned. Otherwise, the char value at the given index is returned.

Internal code checking mechanism:


1) Check the string has Latin character set.
2) If Latin char set then
a) Check index is with in range. If not, throws StringIndexOutOfBoundsException.
b) Get the char at given index and perform logical AND operation with value 0xff.
c) Return the value of step b(above).
3) If UTF16 char set then
a) Getting new length for UTF char based on coder value. i.e 1 for UTF16.
b) Check index is with in range. If not, throws StringIndexOutOfBoundsException.
c) Return code point value for given index by calling StringUTF16.codePointAt(value, index, length);

Conclusion:


We have learnt, how to use codePointAt() method with examples in String class and how it is implemented internally in String class.


No comments:

Post a Comment

Please do not add any spam links in the comments section.