Get Text Between Two Strings in Bash [4 Ways]

Table of Contents

1. Overview
2. Introduction to Problem Statement
3. Using grep Command with -oP option
4. Using sed Command
5. Using awk Command
6. Using Bash Parameter Expansion
7. Conclusion

1. Overview

In scripting and programming, extracting specific portions of text from larger strings or files is a common task. In this article, we will see different ways to get text between two String using grep, awk, sed, and bash parameter expansion.

2. Introduction to Problem Statement

We are given a String, and we need to find text between two given Strings in Bash.
For example:

Input String: Start text[Extract this]End text
Output String: [Extract this]
Our goal is to find text between Start text and End text.

3. Using grep Command with -oP option

The grep is a useful command to search matching patterns in a file or input. It becomes powerful with -o and -P options.


#!/bin/bash
string="Start text[Extract this]End text"
extracted_text=$(echo "$string" | grep -oP '(?<=Start text).*?(?=End text)')
echo "$extracted_text"

#!/bin/bash

string="Start text[Extract this]End text"

extracted_text=$(echo "$string" | grep -oP '(?<=Start text).*?(?=End text)')

echo "$extracted_text"


[Extract this]

[Extract this]

The -o option is used to tell the grep to output only matched portion and the -P option is used to enable the Perl-compatible regular expressions (PCRE) for pattern matching.

Now let’s understand the regular expression:

(?>=Start text): This is a positive lookbehind assertion. It matches a position in the string preceded by the literal string "Start text", but it does not include "Start text" in the match.
.*?: This pattern matches any character (except a newline) zero or more times lazily. The ? makes the * quantifier lazy, matching as little text as possible.
(?=End text): This is a positive lookahead assertion. It matches a position in the string followed by the literal string "End text", but it does not include "End text" in the match.

In other words, this regular expression searched for the text that comes after "Start text" and before "End text" in the input string. Then, the command substitution syntax $() captured the command’s output inside and assigned it to the extracted_text variable, printed on the screen using the echo command.

4. Using sed Command

The sed (Stream Editor) is a powerful and versatile text processing tool that performs text transformations on an input stream (a file or input from a pipeline).

Let’s use sed to get text between two Strings.


#!/bin/bash
string="Start text [Extract this] End text"
extracted_text=$(echo "$string" | sed -n 's/Start text\(.*\)End text/\1/p')
echo "$extracted_text"

#!/bin/bash

string="Start text [Extract this] End text"

extracted_text=$(echo "$string" | sed -n 's/Start text$.*$End text/\1/p')

echo "$extracted_text"


[Extract this]

[Extract this]

Here, the -n option is used to suppress the automatic printing because, by default, sed prints each line of the input; that is why -n controls when to print.

Now let’s breakdown 's/Start text $.*$ End text/\1/p' to understand what it means:

s: It represents the substitution command.
"Start text": It matched the literal string "Start text".
$.*$: Uses parentheses to capture the text between "Start text" and "End text". The captured text is saved in a group.
End text: Matches the literal string " End text".
/\1/: This is the replacement part of the sed command. The \1 refers to the first captured group in the pattern. It is replaced with the captured text between "Start text" and "End text".
p: This specifies that the result should be printed.

Simply, the sed used the substitution operation s to match the given pattern and replaced it with the captured text enclosed in parentheses using the \1 backreference. Finally, the replaced line is printed due to the p flag.

5. Using awk Command

The awk is a powerful scripting language for text processing and is typically used for data extraction. The idea is to use awk with Start text and End text as delimiters and return the second column using {print $2}.


#!/bin/bash
string="Start text[Extract this]End text"
extracted_text=$(echo "$string" | awk -F 'Start text|End text' '{print $2}')
echo "$extracted_text"

#!/bin/bash

string="Start text[Extract this]End text"

extracted_text=$(echo "$string" | awk -F 'Start text|End text' '{print $2}')

echo "$extracted_text"


[Extract this]

[Extract this]

Here, -F is used to specify the field separator for awk. Each input line is divided into separate fields based on the occurrences of either "Start text" or "End text" as separators.

For the input string "Start text[Extract this]End text", the fields would be:

Field 1: ""
Field 2: "[Extract this]"
Field 3: ""

Note that the field separator pattern does not include the actual separators as part of the fields. It is used to define the boundaries for field splitting.

After that, {print $2} is used to print the field 2, which is the required text between two strings.

6. Using Bash Parameter Expansion

Bash parameter expansion offers string manipulation capabilities directly in the shell without calling external commands, which can be efficient for simple operations.

Let’s use Bash parameter expansion to achieve our goal.


#!/bin/bash
string="Start text[Extract this]End text"
extracted_text="${string#*Start text}"
extracted_text="${extracted_text%%End text*}"
echo "$extracted_text"

#!/bin/bash

string="Start text[Extract this]End text"

extracted_text="${string#*Start text}"

extracted_text="${extracted_text%%End text*}"

echo "$extracted_text"


[Extract this]

[Extract this]

The ${string#*Start text } removes the leading portion of the string up to the start boundary. Then, ${extracted_text%% End text*} removes the trailing portion of the string from the end boundary onwards. After that, the echo command displays the required text between two strings.

7. Conclusion

Extracting text between two strings in Bash can be achieved through various methods, each with its own advantages. grep with -oP is powerful for regex-based matching, sed excels in stream editing, awk is great for field-based text processing, and Bash parameter expansion offers a built-in solution.

Was this post helpful?

Let us know if this post was helpful. Feedbacks are monitored on daily basis. Please do provide feedback as that\'s the only way to improve.

Get Text Between Two Strings in Bash

1. Overview

2. Introduction to Problem Statement

3. Using grep Command with -oP option

4. Using sed Command

5. Using awk Command

Further reading:

Bash Split String and Get Last Element

Bash Add Character to String

6. Using Bash Parameter Expansion

7. Conclusion

Was this post helpful?

Author

Leave a Reply Cancel reply

Categories

Popular Posts

Let’s be Friends

1. Overview

2. Introduction to Problem Statement

3. Using grep Command with -oP option

4. Using sed Command

5. Using awk Command

Further reading:

Bash Split String and Get Last Element

Bash Add Character to String

6. Using Bash Parameter Expansion

7. Conclusion

Was this post helpful?

Related posts:

Share this

Author

Leave a Reply Cancel reply

Let’s be Friends