Remove Duplicate Lines in Bash
Duplicate entries can cause a variety of problems in Bash scripts, such as incorrect or inconsistent results, and they can also make the script difficult to maintain. Removing duplicate entries from a script is often necessary to avoid these problems, and there are many ways to do this in Bash.
Remove Duplicate Lines in Bash Using sort and uniq
One way to remove duplicate entries in a Bash script is to use the sort and uniq commands. The sort command sorts the input data into a specified order, and the uniq command filters out duplicate lines from the sorted data.
The data.txt file contains the following content for this article's examples.
arg1
arg2
arg3
arg2
arg2
arg1
To remove duplicate entries from the above file, you can use the following command:
sort data.txt | uniq > data-unique.txt
Output (touch data-unique.txt):
arg1
arg2
arg3
This command sorts the data.txt file in ascending order (by default) and pipes the output to the uniq command . The uniq command filters out duplicate lines from the sorted data and writes the results to a new file named data-unique.txt.
This will remove all duplicate entries from the data.txt file and create a new file containing unique entries.
The uniq command has several options that can be used to control its behavior, such as the -d option to print only duplicate lines, or the -c option to print the number of times each line appears in the input. For example, to print the number of times each line appears in the data.txt file, you can use the following command:
sort data.txt | uniq -c
This command is similar to the previous one, but adds the -c option to uniq
the command. This prints the number of times each line appears in the input along with the line itself.
For example, the results might look like this:
2 arg1
3 arg2
1 arg3
This output shows that line 1 occurs.
Remove Duplicate Lines in Bash using awk Command
Another way to remove duplicate entries in a Bash script is to use the awk command, which is a powerful text processing tool that can perform a variety of operations on text files. awk
The command has a built-in associative array data structure that can store and count the number of occurrences of each line in the input.
For example, to remove duplicate entries from the same file as before, you can use the following command:
awk '!a[$0]++' data.txt > data-unique.txt
Output:
arg1
arg2
arg3
This command uses the awk command to read the data.txt file and applies a simple condition to each input line. The condition uses !a[$0]++
the expression, which increments the value of the a array for each line read.
This effectively counts the number of times each line occurs in the input and stores the counts in an array.
The awk command then applies !a[$0]
the operator of the expression, which negates the value of the array element. This means that only the rows in the array having a count of 0 will pass the condition and print to the output. The output is then redirected to a new file named data-unique.txt containing the unique entries from the data.txt file.
The awk command also provides several options and features that you can use to control its behavior and customize its output. For example, you can use the -F option to specify a different field separator or use the -v option to define variables in a script.
You can also use the printf function to format the output of the awk command in various ways.
The sort and uniq commands are simple and effective tools for removing duplicate entries, while the awk command provides more advanced features and options for customizing the output and behavior of the script.
For reprinting, please send an email to 1244347461@qq.com for approval. After obtaining the author's consent, kindly include the source as a link.
Related Articles
Run a batch (.bat) file in CMD
Publish Date:2025/04/21 Views:169 Category:OPERATING SYSTEM
-
This article will show you how to use CMD to run a batch file.bat. There are three ways in which you can run a batch file. Let us discuss them in the following sections. Run batch (.bat) files in CMD by directly clicking on them This way yo
Running batch scripts using Task Scheduler
Publish Date:2025/04/21 Views:188 Category:OPERATING SYSTEM
-
This article will show you how to use Task Scheduler to run a batch file. Running batch scripts using Task Scheduler With Task Scheduler, you can automate tasks to run automatically at specific times. It only takes a few steps and you don't
Solve the error Make Command Not Found in Cygwin
Publish Date:2025/04/21 Views:75 Category:OPERATING SYSTEM
-
Cygwin allows Windows users to access certain Linux features and includes a large number of GNU and open source tools that are commonly found in popular Linux distributions. When using Cygwin, it is easy to encounter a command not found err
Difference between Bash Nohup and &
Publish Date:2025/04/21 Views:188 Category:OPERATING SYSTEM
-
This short article introduces the nohup command and the control operator to run Linux processes in the background through Bash. In addition, we will further study the key differences between nohup and . Running Linux processes in the backgr
Bash Nohup 与 & 的区别
Publish Date:2025/04/21 Views:186 Category:OPERATING SYSTEM
-
这篇简短的文章介绍了通过 Bash 在后台运行 Linux 进程的 nohup 命令和 控制运算符。 此外,我们将进一步研究 nohup 和 之间的主要区别。 在后台运行 Linux 进程 Linux 提供了两种在后台运行
Getting Timestamp in Bash
Publish Date:2025/04/21 Views:147 Category:OPERATING SYSTEM
-
This article discusses the date Bash command used to obtain the system date/time and UNIX timestamp. Get Timestamp Using date Command in Bash The Linux terminal uses the date command to print the current date and time. The simplest version
Pretty Printing JSON in Shell Script
Publish Date:2025/04/21 Views:145 Category:OPERATING SYSTEM
-
JSON is a textual method for representing JavaScript object literals and arrays and scalar data. It is relatively easier to read and write, and easier for manageable software to parse and generate. JSON is commonly used to serialize structu
Batch checks whether the specified environment variable contains a substring
Publish Date:2025/04/21 Views:72 Category:OPERATING SYSTEM
-
This article discusses how to use the Batch command to test whether an environment variable contains a specific substring. We will introduce two batch scripts that can be used in the above scenario. Checks whether the specified environment
Find the current folder name in Bash
Publish Date:2025/04/21 Views:189 Category:OPERATING SYSTEM
-
Finding a directory is very easy through Bash scripting. But finding the exact directory folder name you are in right now is a bit complicated. This article will introduce three methods to find the folder name from this article directory. I