Pages

Footer Pages

Spring Boot

Java String API

Java Conversions

Kotlin Programs

Kotlin Conversions

Java Threads Tutorial

Java 8 Tutorial

Wednesday, December 11, 2019

Java Program to "Run Length Encoding (RLE)" - String Compression

1. Introduction


In today's article, You will learn What is the "Run Length Encoding" technique which is heavily used in string data compression. This is an experienced java interview question for 4-6 years developers. Earlier days this is used to compress the black and white photos.


Run Length Encoding is abbreviated as "RLE".

Run-length encoding (RLE) is a form of lossless data compression in which runs/flows of data are stored as a single data value and count, rather than as the original run.
Java Program to "Run Length Encoding (RLE)" - String Compression



This works for only sequences in which the same data value occurs in many consecutive data elements.

Examples:

Input: aaabbccccd
Output:a3b2c4d1

Input: xxxyyziiii
Output: x3y2z1i4

Input: kkkkkkiiiiiiuuuuasdfgllll
Output: k6i6u4a1s1d1f1l4

All of the above examples are valid. If the string "aaabba" is invalid because character 'a' is repeated in non-consecutive which is appeared after character 'b'. If the string is invalid then it may produce unexpected results. Please keep in mind this note before implementing it in real-time applications.



2. Algorithm


Now it's time to think about the algorithm that produces the desired output.

Follow these steps to learn and understand the algorithm.

A) Create the output string which is empty and count variable which value is 0 for now.
B) Take the first character and compare it with the next character. If matches, increment count by 1. For each successful match, the count will be incremented by 1. Then take the next character and do compare until it finds no match.
C) Once you find the no match, add the current character and count to the output string.
D) Take the next unmatched character and repeat steps B and C.
F) After completing all characters finally, the output string will have the compressed string.

3. Java Program on String Compression using Run Length Encoding


package com.java.w3schools.blog.java.programs;

public class StringCompression {

 public static void main(String[] args) {

  String inputString = "aaabbccccd";
  String outputString = "";

  int count = 1;
  for (int i = 0; i < inputString.length(); i++) {

   // resetting to 1 for every new character (counting current character).
   count = 1;
   while (i < inputString.length() - 1 && inputString.charAt(i) == inputString.charAt(i + 1)) {
    count++;
    i++;
   }
   outputString = outputString + inputString.charAt(i) + count;
  }
  System.out.println("Input data string : " + inputString);
  System.out.println("Output data string after applying data compression technique : " + outputString);

 }

}

Output:

Input data string : aaabbccccd
Output data string after applying data compression technique : a3b2c4d1

This works well for longer inputs.

Input data string : jjjjjjjjjdddddddddiiiuusskkpppuuutttrewqnbhyj
Output data string after applying data compression technique : j9d9i3u2s2k2p3u3t3r1e1w1q1n1b1h1y1j1

4. Time Complexity


This program runs a loop from index 0 to length on the input string. You have seen that the above program has one for loop and nested while loop. Looks time complexity O(n2) but it is not. Actually, it is O(n) only. The reason behind this is the outer loop run for unique character and inner loop for repeated characters.

So in our example, input string "aaabbccccd" outer for loop runs for the character a,b,c,d and inner while loop runs for only repeated characters such as "aa b ccc".

Outer loop: 4 times
Inner loop: 6 times

Input String length: 10

input length = outer loop length + inner loop length. So it is O(n). 

5. Conclusion


In this article, you have learned how to use the "Run Length Encoding" compression technique on a string. But this compression is applicable only for the inputs which as many consecutive data elements.

Share this article with your friends and post your questions to me.

3 comments:

  1. Hello! Why would it be (i < inputString.length() - 1) in the inner loop? Why is it not (i < inputString.length()) ? Thanks!

    ReplyDelete
    Replies
    1. Hello KB, Nice question.

      It should be i < inputString.length() - 1 because doing the i + 1 in the next step as below.

      inputString.charAt(i) == inputString.charAt(i + 1)

      If you use i < inputString.length() this, then it will throw IndexOutOfBoundsException at runtime.

      Hope, this helpful.

      Delete
  2. Hi, how would you create a run length decoder?

    ReplyDelete

Please do not add any spam links in the comments section.