DSA C++ — Strings Module

01

Foundations

Introduction to Strings

🔤 What is a String?

A string is a sequence of characters stored contiguously in memory. In C++, strings can be represented in two fundamentally different ways — the old C-style approach and the modern C++ std::string class.

Unlike arrays of integers, strings carry the implicit convention that they represent human-readable text. Every character maps to an ASCII value — a critical insight for hashing and frequency-count problems.

💡

ASCII Insight: Lowercase letters 'a'–'z' map to ASCII values 97–122. The difference ch - 'a' gives you an index 0–25, which is the foundation of character frequency arrays.

🧠 How Strings Are Stored in Memory

Memory layout for "hello":

C-string

'h'

'e'

'l'

'o'

'\0'

x

[0]

[1]

[2]

[3]

[4]

[5]

std::string

ptr

size

cap

→ heap-allocated char array

C-strings require a null terminator '\0' — its absence causes undefined behaviour in strlen/strcpy. std::string manages this internally and tracks size and capacity separately for amortised O(1) appends.

⚠️ C-Style Strings

char str[] = "hello";

Array of chars, null-terminated. Manual memory management. Functions: strlen, strcpy, strcmp. Prone to buffer overflow bugs. Used in competitive programming for speed.

✅ C++ std::string

string str = "hello";

Dynamic, safe, rich API. Automatically handles memory. Supports +, ==, range-based for, iterators. The correct choice for interviews and real code.

C++ — String Basics

// C-style string
char cstr[] = "hello";        // stored as h,e,l,l,o,\0
int len = strlen(cstr);       // O(n) — scans until \0

// C++ std::string
string s = "hello";
string s2(5, 'a');           // "aaaaa" — fill constructor
string s3 = s.substr(1, 3); // "ell"  — O(k) copy

// Traversal — multiple ways
for (char c : s) cout << c;   // range-for
for (int i = 0; i < s.size(); i++) cout << s[i];

// Conversion
string ns = to_string(42);   // int → string
int    ni = stoi("42");      // string → int

02

Fundamentals

Basic Operations

Traversal is O(n). Under the hood, s[i] is a direct memory access via pointer arithmetic — same as arrays. The range-for compiles to iterator-based traversal, equally fast.

C++

string s = "hello";

// Method 1: index-based
for (int i = 0; i < (int)s.size(); i++) {
    cout << s[i] << " ";
}

// Method 2: range-based (preferred)
for (char c : s) cout << c << " ";

// Method 3: iterator
for (auto it = s.begin(); it != s.end(); it++)
    cout << *it;

⚠️

Cast s.size() to int when comparing with signed integers. size() returns size_t (unsigned) — i < s.size() with a negative i causes wrap-around bugs.

cin >> s reads until whitespace. For multi-word input, use getline — but after reading an integer with cin, the newline stays in the buffer. Always cin.ignore() before getline.

C++

string s;
cin >> s;                    // reads "hello" (stops at space)

string line;
getline(cin, line);          // reads full line with spaces

// When mixing cin and getline:
int n; cin >> n;
cin.ignore();               // flush newline from buffer
getline(cin, line);          // now safe to read line

s.size() and s.length() are identical for std::string — both return the stored length in O(1) from an internal field. strlen() for C-strings is O(n) — it scans until '\0'.

C++

string s = "hello";
cout << s.size();     // 5 — O(1)
cout << s.length();   // 5 — O(1), identical

char cs[] = "hello";
cout << strlen(cs);   // 5 — O(n)! scans each char

// Empty check
if (s.empty()) { ... }       // prefer over s.size() == 0

The + operator creates a new string each time — O(n). Repeated concatenation in a loop becomes O(n²). Use += or append() for in-place modification — amortised O(1) per character.

C++

string a = "hello", b = " world";

string c = a + b;      // creates NEW string — O(n)
a += b;                // in-place — amortised O(1)
a.append(b);          // same as += for strings
a += '!';             // append single char

// ❌ BAD — O(n²) in a loop
string result = "";
for (char c : chars) result = result + c;

// ✅ GOOD — O(n)
string result = "";
for (char c : chars) result += c;

Comparison operators work lexicographically — character by character via ASCII values. == checks length first (short-circuit), then each char. All comparisons are O(n) in the worst case.

C++

string a = "apple", b = "banana";

if (a == b) { ... }    // false  — lexicographic equality
if (a < b)  { ... }    // true   — 'a' < 'b'

// compare() returns: <0, 0, or >0
int r = a.compare(b);  // negative → a comes first

// Sorting strings uses lexicographic order by default
vector<string> v = {"cat", "apple", "bat"};
sort(v.begin(), v.end());  // apple, bat, cat

03

Core Techniques

Important String Patterns

👆

Two Pointer on Strings

Converging pointers — left from start, right from end

O(n) time · O(1) space

🎯

When to use: Any problem where you process characters from both ends simultaneously — palindrome check, reversing, comparing mirrored positions. The key insight: you meet in the middle after n/2 iterations.

🔁 Palindrome Check

C++

bool isPalindrome(string& s) {
    int l = 0, r = s.size() - 1;
    while (l < r) {
        if (s[l] != s[r])
            return false;
        l++; r--;
    }
    return true;
}
// "racecar" → true
// "hello"   → false

🔄 Reverse a String

C++

void reverseStr(string& s) {
    int l = 0, r = s.size() - 1;
    while (l < r) {
        swap(s[l], s[r]);
        l++; r--;
    }
}
// Or simply: reverse(s.begin(), s.end());
// Both are O(n/2) = O(n)

📊

Frequency Count (Character Hashing)

26-bucket array indexed by character — the backbone of string problems

O(n) time · O(1) space

💡

Core Insight: Since there are only 26 lowercase letters (or 128 ASCII chars), a fixed-size array acts as a perfect hash map. freq[c - 'a'] maps 'a'→0, 'b'→1, … 'z'→25. Building this array is O(n), and querying any char is O(1).

📝 Character Count

C++

string s = "hello";
int freq[26] = {0};   // all zeros

for (char c : s)
    freq[c - 'a']++;

// freq['h'-'a'] = 1
// freq['l'-'a'] = 2
// freq['e'-'a'] = 1

for (int i = 0; i < 26; i++)
    if (freq[i])
        cout << (char)('a'+i)
             << ": " << freq[i] << "\n";

🔀 Anagram Detection

C++

bool isAnagram(string& a, string& b) {
    if (a.size() != b.size()) return false;
    int freq[26] = {0};
    for (int i=0; i<a.size(); i++) {
        freq[a[i]-'a']++;
        freq[b[i]-'a']--;
    }
    for (int x : freq)
        if (x != 0) return false;
    return true;
}
// "listen" & "silent" → true

🪟

Sliding Window

Maintain a dynamic window — expand right, shrink left — for substring problems

O(n) time

🎯

Why it works: Instead of checking all O(n²) substrings, we maintain a window with two pointers. We expand right always, and shrink left only when a constraint is violated. Each element is added and removed at most once — giving O(n) total.

Problem: Longest Substring Without Repeating Characters

C++ — Sliding Window

int lengthOfLongestSubstring(string s) {
    unordered_map<char, int> lastSeen;  // char → last index
    int left = 0, maxLen = 0;

    for (int right = 0; right < s.size(); right++) {
        char c = s[right];

        // If char seen & is inside our window
        if (lastSeen.count(c) && lastSeen[c] >= left) {
            left = lastSeen[c] + 1;  // shrink window
        }

        lastSeen[c] = right;           // update position
        maxLen = max(maxLen, right - left + 1);
    }
    return maxLen;
}
// "abcabcbb" → 3 ("abc")
// "pwwkew"   → 3 ("wke")

Problem: Minimum Window Substring (Conceptual)

📐

Strategy: Build frequency maps for both strings. Expand right until window contains all required chars. Then shrink left while it still satisfies the condition. Track the minimum valid window size throughout. Time: O(n+m).

C++ — Minimum Window

string minWindow(string s, string t) {
    unordered_map<char,int> need, have;
    for (char c : t) need[c]++;

    int formed=0, required=need.size();
    int l=0, minLen=INT_MAX, start=0;

    for (int r=0; r<s.size(); r++) {
        have[s[r]]++;
        // Check if this char satisfies its need
        if (need.count(s[r]) && have[s[r]]==need[s[r]])
            formed++;

        while (formed == required) {  // valid window
            if (r-l+1 < minLen) { minLen=r-l+1; start=l; }
            have[s[l]]--;
            if (need.count(s[l]) && have[s[l]]<need[s[l]])
                formed--;
            l++;
        }
    }
    return minLen==INT_MAX ? "" : s.substr(start, minLen);
}

04

Advanced

Important Algorithms

KMP Algorithm O(n + m)

Knuth-Morris-Pratt — Failure Function + Pattern Matching

🧠

Intuition: Naïve pattern matching retries from scratch on mismatch — O(nm). KMP pre-processes the pattern to build a failure function (LPS array) that stores the longest proper prefix which is also a suffix. On mismatch, instead of restarting, we jump to the position the LPS array tells us — avoiding redundant comparisons.

C++ — KMP

// Step 1: Build LPS (Longest Prefix Suffix) array
vector<int> buildLPS(string& pat) {
    int m = pat.size();
    vector<int> lps(m, 0);
    int len = 0, i = 1;
    while (i < m) {
        if (pat[i] == pat[len]) lps[i++] = ++len;
        else if (len) len = lps[len-1];  // jump
        else lps[i++] = 0;
    }
    return lps;
}

// Step 2: Pattern search using LPS
void KMPSearch(string& txt, string& pat) {
    int n=txt.size(), m=pat.size();
    auto lps = buildLPS(pat);
    int i=0, j=0;
    while (i < n) {
        if (txt[i]==pat[j]) { i++; j++; }
        if (j==m) {
            cout << "Found at: " << i-j << "\n";
            j = lps[j-1];
        } else if (i<n && txt[i]!=pat[j]) {
            if (j) j = lps[j-1];
            else i++;
        }
    }
}

Rabin-Karp Algorithm O(n + m) avg

Rolling Hash — compare hash values before comparing characters

🎲

Intuition: Compute a hash for the pattern. Use a rolling hash to slide over the text — removing the leftmost character and adding the next one in O(1). Only compare characters when hashes match. Worst case O(nm) on hash collisions, but average O(n+m).

C++ — Rabin-Karp

void rabinKarp(string& txt, string& pat) {
    const int BASE=31, MOD=1e9+7;
    int n=txt.size(), m=pat.size();

    // Compute pattern hash & first window hash
    long long patHash=0, winHash=0, power=1;
    for (int i=0; i<m; i++) {
        patHash = (patHash*BASE + pat[i]) % MOD;
        winHash = (winHash*BASE + txt[i]) % MOD;
        if (i) power = power*BASE % MOD;
    }

    for (int i=0; i<=n-m; i++) {
        if (winHash==patHash && txt.substr(i,m)==pat)
            cout << "Match at: " << i << "\n";
        if (i < n-m)  // rolling hash
            winHash = (winHash - txt[i]*power%MOD + MOD)
                      *BASE % MOD + txt[i+m];
    }
}

Z Algorithm O(n)

Z-array — z[i] = length of longest string starting from s[i] which matches a prefix of s

⚡

Intuition: Concatenate pattern + '$' + text. Build the Z-array in O(n). Any position where z[i] == len(pattern) is a match. The '$' separator prevents overlap between the pattern and text.

C++ — Z Algorithm

vector<int> buildZ(string& s) {
    int n=s.size(); vector<int> z(n, 0);
    z[0]=n;
    int l=0, r=0;
    for (int i=1; i<n; i++) {
        if (i<r) z[i]=min(r-i, z[i-l]);
        while (i+z[i]<n && s[z[i]]==s[i+z[i]]) z[i]++;
        if (i+z[i]>r) { l=i; r=i+z[i]; }
    }
    return z;
}

void zSearch(string& txt, string& pat) {
    string concat = pat + "$" + txt;
    auto z = buildZ(concat);
    for (int i=pat.size()+1; i<concat.size(); i++)
        if (z[i]==(int)pat.size())
            cout << "Found at: " << i-pat.size()-1 << "\n";
}

05

Reference

STL String Functions

The C++ Standard Library provides a rich set of string utilities. Understanding what happens internally helps you reason about time complexity and avoid surprises.

Function	Signature	Complexity	Internal Behaviour
`substr()`	`s.substr(pos, len)`	O(len)	Allocates a new string and copies len characters. Expensive in tight loops — avoid inside O(n) iteration.
`find()`	`s.find(pattern)`	O(n·m)	Naive search by default. Returns `string::npos` if not found. Use KMP or Rabin-Karp for repeated searches.
`stoi()`	`stoi(str, &pos)`	O(n)	Parses string to int. Optional second arg stores chars consumed. Throws `invalid_argument` on failure. Use try-catch.
`to_string()`	`to_string(num)`	O(digits)	Converts numeric types to string. Equivalent to sprintf internally. Allocates a new string object.
`reverse()`	`reverse(s.begin(), s.end())`	O(n/2)	In-place swap using two pointers under the hood. No extra allocation. Works on any bidirectional iterator.
`sort()`	`sort(s.begin(), s.end())`	O(n log n)	Sorts characters by ASCII value. Quick way to check anagram: sort both strings and compare.
`erase()`	`s.erase(pos, len)`	O(n)	Removes characters and shifts remainder. Very expensive in a loop — prefer building a new string.
`replace()`	`s.replace(pos, len, str)`	O(n)	Replaces a range with another string. May cause reallocation if new string is larger than capacity.

C++ — STL in action

string s = "hello world";

// substr — extract "world"
string w = s.substr(6, 5);        // "world"

// find — locate "world"
int pos = s.find("world");       // 6
if (pos == string::npos) // not found

// stoi — parse "42" → 42
int num = stoi("42");

// to_string — 3.14 → "3.14"
string pi = to_string(3.14);

// reverse in-place
reverse(s.begin(), s.end());     // "dlrow olleh"

// Check anagram via sort
string a="listen", b="silent";
sort(a.begin(),a.end());
sort(b.begin(),b.end());
bool anagram = (a == b);          // true

06

Interview Focus

Important Problems

Easy Warm-up — must solve without hesitation

E1

Reverse String

Reverse a string in-place. Use two-pointer swap. No extra space needed.

Two Pointer O(n)

E2

Valid Palindrome

Given a string, considering only alphanumeric characters and ignoring cases, determine if it's a palindrome. Filter non-alphanumeric, convert to lowercase, then two-pointer check.

Two Pointer O(n)

E3

Valid Anagram

Check if two strings are anagrams. Build frequency array, increment for s, decrement for t, check all zeros.

Hashing O(n)

E4

Reverse Words in a String

Split by spaces, reverse the word array, rejoin. Alternatively, reverse the whole string then reverse each word in-place.

Two Pointer O(n)

Medium Core interview — demonstrates pattern mastery

M1

Longest Substring Without Repeating Characters

Classic sliding window. Use a map to track last-seen index of each character. Expand right; when a repeat is found, jump left past the previous occurrence.

Sliding Window O(n)

M2

Group Anagrams

For each string, sort its characters to create a canonical key. Group strings with the same sorted key using an unordered_map. Sort-based key: O(n·k·log k). Frequency-count key: O(n·k).

Hashing O(n·k·log k)

M3

Longest Palindromic Substring

Expand-around-center approach. For each character (and each pair), expand outward while characters match. O(n²) time, O(1) space. Manacher's algorithm does it in O(n).

Expand Around Center O(n²)

M4

String to Integer (atoi)

Handle: leading whitespace, optional sign, digit characters, overflow. Process left-to-right, clamp to INT_MIN/INT_MAX on overflow.

Parsing O(n)

Hard Senior interviews — tests algorithmic depth

H1

Minimum Window Substring

Two sliding window pointers with frequency maps. Track how many unique characters are currently satisfied. Expand right; when all satisfied, try shrinking left. Record minimum valid window.

Sliding Window O(n+m)

H2

Regular Expression Matching

Dynamic programming. dp[i][j] = true if s[0..i-1] matches p[0..j-1]. Handle '.' (any char) and '*' (zero or more of preceding). Key transitions involve the '*' cases.

DP O(n·m)

H3

Substring with Concatenation of All Words

Sliding window with step size equal to word length. Maintain two hashmaps — expected and current. Slide across all offsets 0..wordLen-1. Time: O(n·wordLen).

Sliding Window Hashing

07

Watch Out

Common Mistakes

❌

Index Out of Bounds with s.size()

s.size() returns an unsigned type. Comparing i < s.size() - 1 when s is empty causes underflow to a huge number, not -1. Always cast: (int)s.size() or use s.empty() guard first.

❌

Mutating a String During Iteration

Calling s.erase() or s.insert() inside a loop invalidates iterators and shifts indices. Build a result string separately and return it, or use a two-pointer in-place approach only when safe.

❌

O(n²) Concatenation in a Loop

Using result = result + c inside a loop creates a new O(n) string each iteration. Total cost: O(n²). Use result += c for amortised O(1) per append, or use stringstream for complex formatting.

❌

Using substr() Inside Sliding Window

s.substr(l, r-l+1) is O(window size). Calling it inside every iteration of a sliding window loop makes the total complexity O(n²). Store the window's start/length; extract only the final answer.

❌

Not Handling Non-Lowercase Characters

Using freq[c - 'a'] when the string may contain uppercase, digits, or spaces causes negative array indices and undefined behaviour. Always validate the character range or use a 128-element ASCII array.

❌

Confusing find() return type

s.find() returns size_t (unsigned). Checking pos == -1 will never be true. Always check pos == string::npos. Alternatively, store in int after explicit cast.

08

Robustness

Edge Cases

⚠️

Interview Tip: After solving the main logic, always run through these edge cases mentally. Interviewers often ask "what if the string is empty?" immediately after you write your solution.

🟡 Empty String ("")

Most operations should return 0, false, or "". Guard with if (s.empty()) return .... Two-pointer: l < r fails immediately for empty strings — correct behaviour. Sliding window: loop never executes.

🟡 Single Character

A single char is always a palindrome. Frequency count has only one non-zero entry. Sliding window: window of size 1 is always valid. Reverse: no-op. Ensure your algorithm doesn't access s[1] when size is 1.

🟡 All Same Characters ("aaaa")

Palindrome check: trivially true. Longest without repeating: length 1. Anagram: any permutation equals the same string. KMP failure function will have predictable values. Frequency map: one key with count n.

🟡 Spaces and Special Characters

Valid palindrome with spaces: filter first. Frequency count: use 128-size ASCII array. getline vs cin: spaces end cin-read strings. Ensure your char arithmetic handles non-alphabetic chars safely.

🟡 Very Large Input (n = 10⁵–10⁶)

O(n²) approaches will TLE. Never use nested loops without careful analysis. Avoid repeated substr() calls. Use reserve() for string building: result.reserve(n) prevents repeated reallocations.

🟡 Mixed Case ("Hello")

Anagram checks must normalise case. Use tolower(c) during traversal. "Hello" and "hello" are NOT anagrams of each other unless case-insensitive. Palindrome problems often say "ignore case".

C++ — Edge Case Guards

// Always add these guards at the top of string functions
if (s.empty()) return 0;         // or "", or false
if (s.size() == 1) return true;   // single char is palindrome

// Normalise case before processing
string norm;
for (char c : s)
    if (isalnum(c)) norm += tolower(c);

// Safe char-to-index with bounds check
auto idx = [](char c) { return c - 'a'; };  // only for lowercase
auto idxASCII = [](char c) { return (int)c; }; // for all chars

09

Analysis

Time Complexity Insights

⏱️ Why String Operations Are Often O(n)

Unlike arrays of integers, string operations inherently depend on the length of the string. Even "simple" operations like equality check, hashing, and copying require touching every character. This is because strings are variable-length data — the CPU cannot know the result without reading all n bytes.

Operation	Complexity	Reason / Notes
s[i] — random access	`O(1)`	Direct pointer arithmetic, like arrays
s.size() / s.length()	`O(1)`	Stored as a field — doesn't scan the string
s += c (single char)	`O(1) amortised`	Doubling strategy; occasional O(n) realloc
s + t (string concat)	`O(n + m)`	Creates new string, copies both
s == t (equality)	`O(n)`	Compares char-by-char after length check
s.find(pattern)	`O(n·m)`	Naïve — use KMP or Z for O(n+m)
s.substr(pos, len)	`O(len)`	Allocates and copies len characters
reverse(begin, end)	`O(n/2)`	n/2 swaps — still O(n) asymptotically
sort(begin, end)	`O(n log n)`	std::sort — introsort (quicksort + heapsort)
Sliding window	`O(n)`	Each char added/removed at most once
Nested loop on string	`O(n²)`	Every pair of positions — avoid when possible

⚠️

O(n²) Danger Zone: Substring extraction inside a loop, naive pattern matching, nested comparisons. Look for these patterns in your code and replace with sliding window or hashing.

✅

O(n) Techniques: Sliding window, frequency arrays, two-pointer, KMP/Z for matching. These are the gold standard for competitive programming and interviews.

10

Quick Reference

Quick Revision Cheat Sheet

🗂️ Strings — Complete Reference

Core Patterns

Two Pointer: converge from both ends — palindrome, reverse
Freq Array: freq[c-'a']++ — O(1) lookup per char
Sliding Window: expand right, shrink left on violation
Sort-based key: anagram grouping, canonical form
Rolling Hash: Rabin-Karp for O(n) average matching
KMP LPS: skip redundant comparisons on mismatch
Z-Array: longest prefix matching at each position

Key Code Snippets

reverse(s.begin(), s.end()) — in-place O(n)
sort(s.begin(), s.end()) — anagram key
s.find(p) != string::npos — check existence
tolower(c), isalnum(c) — normalisation
stoi(s), to_string(n) — type conversion

Complexity Quick-Look

Access s[i] → O(1)
s.size() → O(1)
s += c → O(1) amort.
s + t → O(n+m)
s.find(p) → O(n·m) naive
KMP / Z → O(n+m)
Sliding Window → O(n)
sort string → O(n log n)

Edge Cases Checklist

☐ Empty string — return base case
☐ Single character — trivially valid
☐ All same chars — verify loop behaviour
☐ Mixed case — normalise with tolower()
☐ Spaces / special chars — isalnum() guard
☐ s.size() unsigned — cast before subtraction
☐ find() → check against string::npos

When To Use Which Pattern

TWO POINTER

Palindrome, Reverse, Compare from ends, Remove duplicates in-place

FREQ ARRAY

Anagram, Character count, Find first unique, Check all chars present

SLIDING WINDOW

Longest/shortest substring with K distinct, Min window, No-repeat substring

C++ — Template Starter

// ═══════════════════════════════════════════
// String Problem Starter Template
// ═══════════════════════════════════════════
#include <bits/stdc++.h>
using namespace std;

void solve(string& s) {
    int n = s.size();
    if (n == 0) return;        // ① empty guard

    // ② Frequency array (lowercase only)
    int freq[26] = {0};
    for (char c : s) freq[c-'a']++;

    // ③ Two pointer skeleton
    int l = 0, r = n - 1;
    while (l < r) {
        // process s[l] and s[r]
        l++; r--;
    }

    // ④ Sliding window skeleton
    unordered_map<char,int> win;
    int left = 0, best = 0;
    for (int right = 0; right < n; right++) {
        win[s[right]]++;
        while (/* constraint violated */ false) {
            win[s[left]]--;
            if (!win[s[left]]) win.erase(s[left]);
            left++;
        }
        best = max(best, right - left + 1);
    }
}

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);
    string s; cin >> s;
    solve(s);
    return 0;
}

── END OF MODULE 02 ──

DSA C++ Revision · Strings

MasterStrings

Introduction to Strings

Basic Operations

Important String Patterns

Important Algorithms

STL String Functions

Important Problems

Common Mistakes

Edge Cases

Time Complexity Insights

Quick Revision Cheat Sheet

Master
Strings