Table of Contents
1. Overview
In scripting and programming, extracting specific portions of text from larger strings or files is a common task. In this article, we will see different ways to get text between two String using grep, awk, sed, and bash parameter expansion.
2. Introduction to Problem Statement
We are given a String, and we need to find text between two given Strings in Bash.
For example:
Input String: Start text[Extract this]End text
Output String: [Extract this]Our goal is to find text between
Start text
andEnd text
.
3. Using grep Command with -oP option
The grep
is a useful command to search matching patterns in a file or input. It becomes powerful with -o
and -P
options.
1 2 3 4 5 6 |
#!/bin/bash string="Start text[Extract this]End text" extracted_text=$(echo "$string" | grep -oP '(?<=Start text).*?(?=End text)') echo "$extracted_text" |
1 2 3 |
[Extract this] |
The -o
option is used to tell the grep
to output only matched portion and the -P
option is used to enable the Perl-compatible regular expressions (PCRE) for pattern matching.
Now let’s understand the regular expression:
- (?>=Start text): This is a positive lookbehind assertion. It matches a position in the string preceded by the literal string
"Start text"
, but it does not include"Start text"
in the match. - .*?: This pattern matches any character (except a newline) zero or more times lazily. The
?
makes the*
quantifier lazy, matching as little text as possible. - (?=End text): This is a positive lookahead assertion. It matches a position in the string followed by the literal string
"End text"
, but it does not include"End text"
in the match.
In other words, this regular expression searched for the text that comes after "Start text"
and before "End text"
in the input string. Then, the command substitution syntax $()
captured the command’s output inside and assigned it to the extracted_text
variable, printed on the screen using the echo
command.
4. Using sed Command
The sed (Stream Editor) is a powerful and versatile text processing tool that performs text transformations on an input stream (a file or input from a pipeline).
Let’s use sed to get text between two Strings.
1 2 3 4 5 6 |
#!/bin/bash string="Start text [Extract this] End text" extracted_text=$(echo "$string" | sed -n 's/Start text\(.*\)End text/\1/p') echo "$extracted_text" |
1 2 3 |
[Extract this] |
Here, the -n
option is used to suppress the automatic printing because, by default, sed
prints each line of the input; that is why -n
controls when to print.
Now let’s breakdown 's/Start text \(.*\) End text/\1/p'
to understand what it means:
s
: It represents the substitution command."Start text"
: It matched the literal string"Start text"
.\(.*\)
: Uses parentheses to capture the text between"Start text"
and"End text"
. The captured text is saved in a group.End text
: Matches the literal string" End text"
./\1/
: This is the replacement part of thesed
command. The\1
refers to the first captured group in the pattern. It is replaced with the captured text between"Start text"
and"End text"
.p
: This specifies that the result should be printed.
Simply, the sed
used the substitution operation s
to match the given pattern and replaced it with the captured text enclosed in parentheses using the \1
backreference. Finally, the replaced line is printed due to the p
flag.
5. Using awk Command
The awk
is a powerful scripting language for text processing and is typically used for data extraction. The idea is to use awk with Start text
and End text
as delimiters and return the second column using {print $2}
.
1 2 3 4 5 6 |
#!/bin/bash string="Start text[Extract this]End text" extracted_text=$(echo "$string" | awk -F 'Start text|End text' '{print $2}') echo "$extracted_text" |
1 2 3 |
[Extract this] |
Here, -F
is used to specify the field separator for awk
. Each input line is divided into separate fields based on the occurrences of either "Start text"
or "End text"
as separators.
For the input string "Start text[Extract this]End text"
, the fields would be:
- Field 1: ""
- Field 2: "[Extract this]"
- Field 3: ""
Note that the field separator pattern does not include the actual separators as part of the fields. It is used to define the boundaries for field splitting.
After that, {print $2} is used to print the field 2
, which is the required text between two strings.
Further reading:
6. Using Bash Parameter Expansion
Bash parameter expansion offers string manipulation capabilities directly in the shell without calling external commands, which can be efficient for simple operations.
Let’s use Bash parameter expansion to achieve our goal.
1 2 3 4 5 6 7 |
#!/bin/bash string="Start text[Extract this]End text" extracted_text="${string#*Start text}" extracted_text="${extracted_text%%End text*}" echo "$extracted_text" |
1 2 3 |
[Extract this] |
The ${string#*Start text }
removes the leading portion of the string up to the start boundary. Then, ${extracted_text%% End text*}
removes the trailing portion of the string from the end boundary onwards. After that, the echo
command displays the required text between two strings.
7. Conclusion
Extracting text between two strings in Bash can be achieved through various methods, each with its own advantages. grep with -oP
is powerful for regex-based matching, sed
excels in stream editing, awk
is great for field-based text processing, and Bash parameter
expansion offers a built-in solution.