REGEXP_REPLACE
REGEXP_REPLACE is a function in SQL used to replace a substring that matches a regular expression pattern with a specific string. It is mainly applied in order to modify and return a new string based on pattern matching.
REGEXP_REPLACE(expr, pat, repl[, position[, occurrence[, match_type]]])
- expr: This is the string or column of a table that is to be searched for a pattern.
- pat: The pattern to be searched within the expr parameter. It is the regular expression that identifies what will be replaced.
- repl: A string to replace the matched patterns in the expr.
- position: An optional parameter specifying the position in the expr where the function starts the search for the matching pattern. The default position is 1, indicating the beginning of the string.
- occurrence: Also optional, it indicates which occurrence of the matched pattern will be replaced. If omitted, all occurrences will be replaced.
- match_type: Another optional parameter which determines how the function matches the pattern. It can be ‘c’ (case sensitive), ‘i’ (case insensitive), ‘m’ (multi-line mode), ‘n’ (UNIX line endings), ‘u’ (Unicode), or ‘x’ (ignore whitespace and comments in the pattern). The default match type is ‘c’.
Example
Output
Explanation
The REGEXP_REPLACE function in this example finds every group of two digits in the string ‘12345678’, denoted by ’(\d2)’, and replaces them with the same number followed by a dash, denoted by ‘\1-’. The result is the original number separated into pairs by dashes.
REGEXP_REPLACE(source_string, pattern, replacement_string, optional_flags)
- source_string: This is the original string that will be searched for a pattern match. It is the input where the regular expression operation will be performed.
- pattern: Defined as a regular expression. This pattern will be searched within the source string.
- replacement_string: This is the text that will replace the matched portion of the source string.
- optional_flags: Used to control the regex operation. They consist of standard regular expression flags which include case sensitivity, global searching, multiline mode and more. This parameter is optional.
Example
Output
Explanation
In the provided example, the REGEXP_REPLACE
function is used to remove all digits from the input string ‘123abc’. The ‘\d’ is the pattern for any digit and ” means replace the digits with nothing (i.e., deleting them). The ‘g’ flag means to replace globally in the entire string, not just the first instance. Therefore, the output is ‘abc’.
REGEXP_REPLACE(source_string, pattern [,replace_string [,position [,occurrence [,match_parameter]]]])
- source_string: This is the original string that will be evaluated. The REGEXP_REPLACE function will search this string for instances of the defined pattern.
- pattern: This is the regular expression pattern which we want to find in the source_string.
- replace_string: This is the optional parameter. If the pattern is found in the source_string, it will be replaced with the replace_string. If not specified, occurrences of the pattern are removed from the source_string.
- position: This is an optional parameter that specifies the position in the source_string where the function is to begin searching. The position is based on a 1-based index, meaning counting starts at 1. If not specified, the function starts at the beginning of the source_string.
- occurrence: This is an optional parameter that specifies which occurrence of the pattern to search for. If not specified, the function defaults to 1, meaning it will replace the first occurrence of the pattern.
- match_parameter: This is an optional parameter that determines how the function handles case sensitivity and multilingual comparisons. Possible values include ‘i’ for case insensitivity, ‘c’ for case sensitivity, ‘n’ to allow . (period) to match the newline character, and ‘m’ for multiline mode.
Example
Output
Explanation
The REGEXP_REPLACE function in this context replaces all non-numeric characters (using the notation [^0-9]
) in the string '123 ABC 456 DEF'
with nothing (i.e., removes them), resulting in the string ‘123456’.