REGEXP_SUBSTR

REGEXP_SUBSTR is a function in SQL that allows pattern matching of regular expressions in strings. It returns the substring that matches a regular expression pattern from an original string. This function plays a crucial role in the manipulation and analysis of data where pattern matching is involved.

REGEXP_SUBSTR(source_column, pattern [, start_position [, occurrence [, match_parameter]]])

  • source_column: The string column or expression where the regular expression will be searching.
  • pattern: The regular expression pattern that the function will search for in the source_column.
  • start_position: This optional parameter determines the position in the source column where the function should start looking for the pattern. By default, the start_position is 1.
  • occurrence: Another optional parameter, defining which occurrence of the pattern to search for. If omitted, the first occurrence is used.
  • match_parameter: This optional parameter enables various features related to case sensitivity, multi-line operations, and pattern matching. If omitted, the search is case-sensitive and single-line.

Example

SELECT REGEXP_SUBSTR('123@456@789', '[^@]+', 1, 2) FROM dual;

Output

456

Explanation

The REGEXP_SUBSTR function in the example is used to extract a substring from a string. It utilizes regular expressions to specify the patterns to be extracted. The pattern here ’[^@]+’ tells the function to match any group of characters that are not ’@’. The following integers (1, 2) denote the start position and the occurrence, respectively. In this case, it starts from position 1 and returns the second occurrence of the pattern, which is ‘456’.

REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])

  • expr: The string expression from which substrings are to be returned. This parameter represents the source string in which the SQL function will search for a pattern.
  • pat: The string pattern to find in expr. This is the pattern or the regular expression that the SQL function will use to match within the source string.
  • pos: An optional parameter specifying the position in the string expr to start the search. This parameter allows to specify at which character position within the source string the SQL function should start searching for the pattern or the regular expression. The default value is 1, meaning that the function starts searching from the beginning of the string.
  • occurrence: Another optional parameter specifying which occurrence of the pattern to search for. This parameter is used to specify which instance of the matching pattern or regular expression within the source string should be returned. The default value is 1, which means that the SQL function returns the first instance of the matching pattern it finds in the source string.
  • match_type: An optional string to indicate the type of match to perform. This parameter allows to manipulate the way the matching is carried out by the SQL function. For example, specifying ‘i’ for this parameter would enable case-insensitive matching. If this parameter is not specified, the default behavior is case-sensitive matching.

Example

SELECT REGEXP_SUBSTR('The quick brown fox jumps over the lazy dog', '[a-z]{4}') AS Result;

Output

+--------+
| Result |
+--------+
| The |
+--------+

Explanation

In the provided example, REGEXP_SUBSTR function is used to find the first substring that matches the regular expression [a-z]{4} in the string The quick brown fox jumps over the lazy dog. The regular expression [a-z]{4} specifies a pattern that matches a four-letter word in lowercase. It returned ‘The ’ as the first match. Please note, it includes the space (’ ’) as a valid character, hence we have 3 characters ‘The’ and 1 space in the output.

For in-depth explanations and examples SQL keywords where you write your SQL, install our extension.