LeetCode 187 – Repeated DNA Sequences (Java Solution with Sliding Window and HashSet)

Step-by-Step Explanation with Optimized Approach and Time Complexity Analysis

LeetCode 187 – Repeated DNA Sequences (Java Solution with Sliding Window and HashSet)

Step-by-Step Explanation with Optimized Approach and Time Complexity Analysis

Krishna Shrivastava

February 22, 2026

10 views

Java Sliding Window Medium

Introduction

In this article, we will solve LeetCode 187: Repeated DNA Sequences using Java. This is a popular string problem that tests your understanding of the sliding window technique and efficient use of hash-based data structures.

DNA sequences are composed of four characters:

A (Adenine)
C (Cytosine)
G (Guanine)
T (Thymine)

The goal is to identify all 10-letter-long substrings that appear more than once in a given DNA string.

You can try solving the problem directly on LeetCode here:

https://leetcode.com/problems/repeated-dna-sequences/

Problem Statement

Given a string s that represents a DNA sequence, return all the 10-letter-long substrings that occur more than once.

Example 1

Input:

s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT"

Output:

["AAAAACCCCC", "CCCCCAAAAA"]

Example 2

Input:

s = "AAAAAAAAAAAAA"

Output:

["AAAAAAAAAA"]

Key Observations

We only need substrings of fixed length 10.
The maximum length of the string can be up to 10^5.
A brute-force solution checking all substrings multiple times would be inefficient.
This problem can be solved efficiently using a sliding window and hash-based data structures.

Approach 1: Sliding Window with HashSet (Given Solution)

Idea

Use two pointers (i and j) to maintain a sliding window.
Build a substring of size 10 dynamically.
Store previously seen substrings in a HashSet.
If a substring is already present in the set:
Check if it is already in the result list.
If not, add it to the result list.
Slide the window forward and continue.

Java Code (Your Implementation)

class Solution {

public List<String> findRepeatedDnaSequences(String s) {

HashSet<String> ms = new HashSet<>();

List<String> lis = new ArrayList<>();

int i = 0;

int j = 0;

String tes = "";

while (j < s.length()) {

tes += s.charAt(j);

if (j - i + 1 < 10) {

j++;

} else {

if (j - i + 1 == 10) {

if (ms.contains(tes)) {

boolean fl = false;

for (String a : lis) {

if (a.equals(tes)) {

fl = true;

}

if (!fl) {

lis.add(tes);

}

} else {

ms.add(tes);

}

tes = tes.substring(1);

i++;

j++;

}

return lis;

}

Explanation

The variable tes maintains the current substring.
ms stores all previously seen substrings of length 10.
If a substring already exists in ms, we manually check whether it has already been added to the result list.
This avoids duplicate entries in the final output.

Time Complexity

Sliding through the string: O(n)
Checking duplicates in the result list: O(n) in the worst case
Overall worst-case complexity: O(n²)

Space Complexity

HashSet storage: O(n)

Limitation of Approach 1

The manual duplicate check using a loop inside the result list introduces unnecessary overhead. This makes the solution less efficient.

We can improve this by using another HashSet to automatically handle duplicates.

Approach 2: Optimized Solution Using Two HashSets

Idea

Use one HashSet called seen to track all substrings of length 10.
Use another HashSet called repeated to store substrings that appear more than once.
Iterate from index 0 to s.length() - 10.
Extract substring of length 10.
If adding to seen fails, it means it has appeared before.
Add it directly to repeated.

This removes the need for a nested loop.

Optimized Java Code

class Solution {

public List<String> findRepeatedDnaSequences(String s) {

Set<String> seen = new HashSet<>();

Set<String> repeated = new HashSet<>();

for (int i = 0; i <= s.length() - 10; i++) {

String substring = s.substring(i, i + 10);

if (!seen.add(substring)) {

repeated.add(substring);

}

return new ArrayList<>(repeated);

}

Why This Approach is Better

No manual duplicate checking.
Cleaner and more readable code.
Uses HashSet properties efficiently.
Each substring is processed only once.

Time Complexity (Optimized)

Single traversal of the string: O(n)
Substring extraction of fixed length 10: O(1)
Overall time complexity: O(n)

Space Complexity

Two HashSets storing substrings: O(n)

Conclusion

LeetCode 187 is a classic example of combining the sliding window technique with hash-based data structures.

The first approach works but has unnecessary overhead due to manual duplicate checks.
The second approach is more optimal, cleaner, and recommended for interviews.
Always leverage the properties of HashSet to avoid redundant checks.

This problem highlights the importance of choosing the right data structure to optimize performance.

Minimum Changes to Make Alternating Binary String – LeetCode 1758 Explained

Understand how to convert a binary string into an alternating pattern with the minimum number of flips using a simple greedy approach.

Read Article

Longest Palindrome – Building the Maximum Length from Given Letters (LeetCode 409)

Using Frequency Counting to Construct the Largest Possible Palindrome

Read Article

Length of Longest Subarray With At Most K Frequency – Sliding Window Approach

Learn how to find the longest contiguous subarray where each element occurs at most K times using HashMap and sliding window.

Read Article

Maximum Average Subarray I — Efficient Solution Using Sliding Window (LeetCode 643)

How to optimize subarray average calculation from brute force to an O(n) sliding window approach

Read Article

Kode$word

LeetCode 187 – Repeated DNA Sequences (Java Solution with Sliding Window and HashSet)

LeetCode 187 – Repeated DNA Sequences (Java Solution with Sliding Window and HashSet)

Introduction

Problem Statement

Example 1

Example 2

Key Observations

Approach 1: Sliding Window with HashSet (Given Solution)

Idea

Java Code (Your Implementation)

Explanation

Time Complexity

Space Complexity

Limitation of Approach 1

Approach 2: Optimized Solution Using Two HashSets

Idea

Optimized Java Code

Why This Approach is Better

Time Complexity (Optimized)

Space Complexity

Conclusion

Related Articles

Minimum Changes to Make Alternating Binary String – LeetCode 1758 Explained

Longest Palindrome – Building the Maximum Length from Given Letters (LeetCode 409)

Length of Longest Subarray With At Most K Frequency – Sliding Window Approach

Maximum Average Subarray I — Efficient Solution Using Sliding Window (LeetCode 643)

Ai Assistant Kas

Ai Assistant Kas