[Leetcode][686. 重复叠加字符串匹配] 3种方法：暴力，字符串哈希，KMP算法

发表于 2021-12-22 更新于 2024-03-05 分类于 Leetcode Waline：阅读次数：本文字数： 6.8k 阅读时长 ≈ 6 分钟

By Long Luo

686. 重复叠加字符串匹配

给定两个字符串a和b，寻找重复叠加字符串a的最小次数，使得字符串b成为叠加后的字符串a的子串，如果不存在则返回-1。

注意：字符串"abc"重复叠加0次是""，重复叠加1次是"abc"，重复叠加2次是"abcabc"。

示例 1：
输入：a = "abcd", b = "cdabcdab"
输出：3
解释：a 重复叠加三遍后为"ab**cdabcdab**cd", 此时b是其子串。

示例 2：
输入：a = "a", b = "aa"
输出：2

示例 3：
输入：a = "a", b = "a"
输出：1

示例 4：
输入：a = "abc", b = "wxyz"
输出：-1
 
提示：
1 <= a.length <= 10^4
1 <= b.length <= 10^4
a 和 b 由小写英文字母组成

分析

可以分为下面\(3\)种情况：

\(\textit{A}\) ==\(\textit{B}\)，复制 \(1\) 次；
\(lenA > lenB\)，直觉是只需要复制 \(2\) 次；
\(lenA < lenB\)，首先至少将 \(\textit{A}\) 复制几次后长度 \(>=lenB\)，然后至多再复制 \(\textit{A}\) 一次。

下面简单证明下，也就是复制次数的下限和上限：

至少将字符串 \(\textit{A}\) 复制 \(N\) 次之后主串 \(\textit{S}\)，那么 \(lenS >= lenB\) ，才有可能匹配，至此我们得到了复制次数的下限：\(\frac{lenB}{lenA}+1\)；
如果从主串 \(\textit{S}\) 中找到子串 \(\textit{B}\)，因此可以明确子串的起始位置，不会超过 \(lenA\)；因为如果起始匹配位置 \(>= lenA\)，那么说明在之前已经被匹配过了。

综上分析可知：

\[ \frac{lenB}{lenA} + 1 \le repeatedTimes \le \frac{lenB}{lenA} + 2 \]

方法一：暴力

思路与算法：

首先想到的是暴力法。最开始没有去分析复制次数上下限时，使用了一个朴素的想法：

\(\textit{B}\) 中含有 \(\textit{A}\) 中没有的字母，那么直接返回 \(-1\)；
最少复制次数：\(\textit{B}\) 中字母出现次数 \(\textit{A}\) 中字母次数的最大值。

使用 \(\texttt{HashMap}\) 存储字母出现次数。

代码如下所示：

public int repeatedStringMatch(String a, String b) {
    if (a.equals(b) || a.contains(b)) {
        return 1;
    }

    int min = 1;
    Map<Character, Integer> mapA = new HashMap<>();
    Map<Character, Integer> mapB = new HashMap<>();
    for (char ch : a.toCharArray()) {
        mapA.put(ch, mapA.getOrDefault(ch, 0) + 1);
    }
    for (char ch : b.toCharArray()) {
        mapB.put(ch, mapB.getOrDefault(ch, 0) + 1);
    }
    
    for (char ch : mapB.keySet()) {
        if (!mapA.containsKey(ch)) {
            return -1;
        } else {
            int times = mapB.get(ch) / mapA.get(ch);
            min = Math.max(min, times);
        }
    }

    StringBuilder res = new StringBuilder(a);
    for (int i = 2; i <= min + 1; i++) {
        res.append(a);
        if (res.toString().contains(b)) {
            return i;
        }
    }

    return -1;
}

因为字符串均由小写英文字母组成，所以可以直接用数组进行计数，速度提高了很多。

public int repeatedStringMatch(String a, String b) {
    if (a.equals(b) || a.contains(b)) {
        return 1;
    }

    int[] cntA = new int[26];
    int[] cntB = new int[26];

    for (char ch : a.toCharArray()) {
        cntA[ch - 'a']++;
    }
    for (char ch : b.toCharArray()) {
        cntB[ch - 'a']++;
    }

    int min = 1;

    for (int i = 0; i < 26; i++) {
        if (cntB[i] > 0 && cntA[i] == 0) {
            return -1;
        }

        if (cntA[i] > 0) {
            int times = cntB[i] / cntA[i];
            min = Math.max(min, times);
        }
    }

    StringBuilder res = new StringBuilder(a);
    for (int i = 2; i <= min + 1; i++) {
        res.append(a);
        if (res.toString().contains(b)) {
            return i;
        }
    }

    return -1;  
}

值得一提的是，如果不用比较 \(\textit{A}\), \(\textit{B}\) 字母的话，直接进行比较的话，那么耗时会成倍增加，所以预先排除一些test case有很大帮助。

public int repeatedStringMatch(String a, String b) {
    if (a.equals(b) || a.contains(b)) {
        return 1;
    }

    int minRepeatTimes = Math.max(1, b.length() / a.length() + 1);

    StringBuilder res = new StringBuilder(a);
    for (int i = 2; i <= minRepeatTimes + 1; i++) {
        res.append(a);
        if (res.toString().contains(b)) {
            return i;
        }
    }

    return -1;
}

复杂度分析：

时间复杂度：\(O(n + m + nm)\)，其中 \(n\) 为 \(\textit{A}\) 的长度，\(m\) 为 \(\textit{B}\) 的长度。
空间复杂度：\(O(26)\)。

方法二：字符串哈希

思路与算法：

将 \(\textit{A}\) 复制 \(min\) 次后得到 \(\textit{S}\)，从 \(\textit{S}\) 中检测是否存在子串为 \(\textit{B}\)，那么 \(\textit{B}\) 哈希值为 \([lenS - m + 1, lenS]\) 部分，记为 \(\textit{target}\)。

然后在 \([1, n]\) 范围内枚举起点，尝试找长度为 \(m\) 的哈希值与 \(\textit{target}\) 相同的哈希值。

代码如下所示：

public int repeatedStringMatch(String a, String b) {
    if (a.equals(b) || a.contains(b)) {
        return 1;
    }

    int lenA = a.length();
    int lenB = b.length();

    StringBuilder sb = new StringBuilder(a);
    int ans = 1;
    while (sb.length() < b.length()) {
        ans++;
        sb.append(a);
    }
    
    int lenS = sb.length();
    sb.append(a);
    for (int i = 0; i < lenA; i++) {
        if (i + lenB <= sb.length() && sb.substring(i, i + lenB).hashCode() == b.hashCode()) {
            return i + lenB <= lenS ? ans : ans + 1;
        }
    }

    return -1;
}

上述是利用Java的 \(\texttt{hashCode()}\) 取巧的方法，如果需要自定义 \(\texttt{hashCode()}\)，可以参考 3种方法：从暴力到二分，二分搜索 + 字符串哈希，后缀数组和 Rabin-Karp 算法。

class Solution {
    static final int kMod1 = 1000000007;
    static final int kMod2 = 1337;

    public int repeatedStringMatch(String a, String b) {
        int an = a.length(), bn = b.length();
        int index = strStr(a, b);
        if (index == -1) {
            return -1;
        }
        if (an - index >= bn) {
            return 1;
        }
        return (bn + index - an - 1) / an + 2;
    }

    public int strStr(String haystack, String needle) {
        int n = haystack.length(), m = needle.length();
        if (m == 0) {
            return 0;
        }

        int k1 = 1000000009;
        int k2 = 1337;
        Random random = new Random();
        int kMod1 = random.nextInt(k1) + k1;
        int kMod2 = random.nextInt(k2) + k2;

        long hashNeedle = 0;
        for (int i = 0; i < m; i++) {
            char c = needle.charAt(i);
            hashNeedle = (hashNeedle * kMod2 + c) % kMod1;
        }
        long hashHaystack = 0, extra = 1;
        for (int i = 0; i < m - 1; i++) {
            hashHaystack = (hashHaystack * kMod2 + haystack.charAt(i % n)) % kMod1;
            extra = (extra * kMod2) % kMod1;
        }
        for (int i = m - 1; (i - m + 1) < n; i++) {
            hashHaystack = (hashHaystack * kMod2 + haystack.charAt(i % n)) % kMod1;
            if (hashHaystack == hashNeedle) {
                return i - m + 1;
            }
            hashHaystack = (hashHaystack - extra * haystack.charAt((i - m + 1) % n)) % kMod1;
            hashHaystack = (hashHaystack + kMod1) % kMod1;
        }
        return -1;
    }
}

复杂度分析：

时间复杂度：\(O(n + m)\)，其中 \(n\) 为 \(\textit{A}\) 的长度，\(m\) 为 \(\textit{B}\) 的长度。
空间复杂度：\(O(1)\)。

方法三：KMP算法

思路与算法：

使用 \(\textit{KMP}\) 算法来实现字符串匹配的功能，\(\textit{KMP}\) 算法实现可参考理解KMP算法。

代码如下所示：

  public int repeatedStringMatch(String a, String b) {
      if (a.equals(b) || a.contains(b)) {
          return 1;
      }

      int lenA = a.length();
      int lenB = b.length();
      int idx = strStr(a, b);
      if (idx == -1) {
          return -1;
      }

      if (lenA - idx >= lenB) {
          return 1;
      }

// here lenB - (lenA - idx) but in case of more so minus 1.
      return (lenB - (lenA - idx) - 1) / lenA + 2;
  }

  public int strStr(String haystack, String needle) {
      if (needle == null || needle.length() == 0) {
          return 0;
      }

      int sLen = haystack.length();
      int pLen = needle.length();

      int[] next = new int[pLen];

      // build next array
      for (int right = 1, left = 0; right < pLen; right++) {

          while (left > 0 && needle.charAt(left) != needle.charAt(right)) {
              left = next[left - 1];
          }

          if (needle.charAt(left) == needle.charAt(right)) {
              left++;
          }

          next[right] = left;
      }

      for (int i = 0, j = 0; i - j < sLen; i++) {
          while (j > 0 && haystack.charAt(i % sLen) != needle.charAt(j)) {
              j = next[j - 1];
          }

          if (haystack.charAt(i % sLen) == needle.charAt(j)) {
              j++;
          }

          if (j == pLen) {
              return i - pLen + 1;
          }
      }

      return -1;
  }

复杂度分析

时间复杂度：\(O(n+m)\)，其中 \(n\) 为 \(\textit{A}\) 的长度，\(m\) 为 \(\textit{B}\) 的长度，\(\textit{KMP}\) 算法的时间复杂度为 \(O(n+m)\)。
空间复杂度：\(O(m)\)。\(\textit{KMP}\) 算法需要 \(O(m)\) 的空间来保存 \(\textit{next}\) 数组。

All suggestions are welcome. If you have any query or suggestion please comment below. Please upvote👍 if you like💗 it. Thank you:-)

Explore More Leetcode Solutions. 😉😃💗