Data Structures for Range Sorted Consecutive Occurrence Queries

Waseem Akram, Takuya Mieno

公開日: 2024/11/18

Abstract

The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for preprocessing and a pattern P of length m as a query, the goal is to report all occurrences of P as substrings of T. Navarro and Thankachan [CPM 2015, Theor. Comput. Sci. 2016] introduced a variant of this problem called the gap-bounded consecutive occurrence query, which reports pairs of consecutive occurrences of P in T such that their gaps (i.e., the distances between them) lie within a query-specified range [g_1, g_2]. Recently, Bille et al. [FSTTCS 2020, Theor. Comput. Sci. 2022] proposed the top-k close consecutive occurrence query, which reports the k closest consecutive occurrences of P in T, sorted in non-decreasing order of distance. Both problems are optimally solved in query time with O(n \log n)-space data structures. In this paper, we generalize these problems to the range query model, which focuses only on occurrences of P in a specified substring T[a.. b] of T. Our contributions are as follows: (1) We propose an O(n \log^2 n)-space data structure that answers the range top-k consecutive occurrence query in O(m + \log\log n + k) time and can be built in O(n \log^3 n) time; and (2) We propose an O(n \log^{2+\epsilon} n)-space data structure that answers the range gap-bounded consecutive occurrence query in O(m + \log\log n + output) time and can be constructed in O(n \log^{5/2}n) time, where \epsilon is a positive constant and output denotes the number of outputs. As by-products, we present algorithms for geometric problems involving weighted horizontal segments in a 2D plane, which are of independent interest. Furthermore, we observe that consecutive occurrences are related to closed substrings of a string.

Data Structures for Range Sorted Consecutive Occurrence Queries | SummarXiv | SummarXiv