カテゴリー
【Awkでデータ解析のすゝめ】gawk(GNU AWK)でカスタムソートを使ってみる
※ 当ページには【広告/PR】を含む場合があります。
2020/02/08
2023/06/26
はじめに〜mawkならgawkに移行しよう
$ awk -W version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan
compiled limits:
max NF 32767
sprintf buffer 1020
$ gawk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2018 Free Software Foundation.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/.
$ sudo apt-get install gawk
$ gawk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2018 Free Software Foundation.
#👇aptでパッケージインストールするとデフォルトのawkも自動で置き換わる
$ awk --version
GNU Awk 4.2.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.1.2)
Copyright (C) 1989, 1991-2018 Free Software Foundation.
sorti
Macユーザーの注意点〜nawkからgawkに移行する場合
$ brew install gawk
$ AWK_PATH=$(which gawk)
#👇.zshrcに書き込む場合
$ echo 'PATH="'$AWK_PATH'/opt/gawk/libexec/gnubin:$PATH"' >> ~/.zshrc
$ source ~/.zshrc
awkでカスタムソート
gawk
PROCINFO["sorted_in"]でカスタムソート
PROCINFO
for ... in
EXAM_SCORE
1.sh
#!/bin/bash
#👇1列目がキー、2列目が値として使う
EXAM_SCORE=$(cat << EOF
Ichiro 84
Bob 25
Wakame 76
Hanako 56
Alice 90
Kabao 53
Jam 43
Tarao 25
Cheese 12
Kasuo 47
Piyoko 88
Ikura 29
EOF
)
echo "$EXAM_SCORE" | awk '
function cmp_num_val(i1, v1, i2, v2) {
if (v1 < v2) {
return -1;
} else {
return 1;
}
}
{
data[$1] = $2;
}
END {
PROCINFO["sorted_in"] = "cmp_num_val";
for (j in data) {
printf("data[%s] = %s\n", j, data[j]);
}
}
'
$ chmod +x 1.sh
$ ./1.sh
data[Cheese] = 12
data[Tarao] = 25
data[Bob] = 25
data[Ikura] = 29
data[Jam] = 43
data[Kasuo] = 47
data[Kabao] = 53
data[Hanako] = 56
data[Wakame] = 76
data[Ichiro] = 84
data[Piyoko] = 88
data[Alice] = 90
PROCINFO["sorted_in"]
#.......
function cmp_num_val(i1, v1, i2, v2) {
if (v1 < v2) {
return -1;
} else {
return 1;
}
}
#.......
{
PROCINFO["sorted_in"] = "cmp_num_val";
for (j in data) {
printf("data[%s] = %s\n", j, data[j]);
}
}
#.......
PROCINFO["sorted_in"]
for ... in 配列名
PROCINFO["sorted_in"]
PROCINFO["sorted_in"]
関数名(i1, v1, i2, v2)
function cmp_num_val(i1, v1, i2, v2) {
if (v1 > v2) {
return -1;
} else {
return 1;
}
}
2.sh
#!/bin/bash
EXAM_SCORE=$(cat << EOF
Ichiro 84
Bob 25
Wakame 76
Hanako 56
Alice 90
Kabao 53
Jam 43
Tarao 25
Cheese 12
Kasuo 47
Piyoko 88
Ikura 29
EOF
)
echo "$EXAM_SCORE" | awk '
function cmp_str_ind(i1, v1, i2, v2) {
if (i1 < i2) {
return -1;
} else {
return 1;
}
}
{
data[$1] = $2;
}
END {
PROCINFO["sorted_in"] = "cmp_str_ind";
for (j in data) {
printf("data[%s] = %s\n", j, data[j]);
}
}
'
$ chmod +x 2.sh
$ ./2.sh
data[Alice] = 90
data[Bob] = 25
data[Cheese] = 12
data[Hanako] = 56
data[Ichiro] = 84
data[Ikura] = 29
data[Jam] = 43
data[Kabao] = 53
data[Kasuo] = 47
data[Piyoko] = 88
data[Tarao] = 25
data[Wakame] = 76
PROCINFO["sorted_in"]
asort/asorti関数でカスタムソート
PROCINFO["sorted_in"]
値をソートする〜asort
配列の値
asort
asort(ソートする元の配列, ソート結果配列, ソート処理関数)
3.sh
#!/bin/bash
EXAM_SCORE=$(cat << EOF
Ichiro 84
Bob 25
Wakame 76
Hanako 56
Alice 90
Kabao 53
Jam 43
Tarao 25
Cheese 12
Kasuo 47
Piyoko 88
Ikura 29
EOF
)
echo "$EXAM_SCORE" | awk '
function cmp_num_val(i1, v1, i2, v2) {
if (v1 < v2) {
return -1;
} else {
return 1;
}
}
{
data[$1] = $2;
}
END {
asort(data, sorted_data, "cmp_num_val");
for (j in sorted_data) {
printf("sorted_data[%s] = %s\n", j, sorted_data[j]);
}
}
'
$ chmod +x 3.sh
$ ./3.sh
sorted_data[1] = 12
sorted_data[2] = 25
sorted_data[3] = 25
sorted_data[4] = 29
sorted_data[5] = 43
sorted_data[6] = 47
sorted_data[7] = 53
sorted_data[8] = 56
sorted_data[9] = 76
sorted_data[10] = 84
sorted_data[11] = 88
sorted_data[12] = 90
1.sh
PROCINFO["sorted_in"]
キー(インデックス)をソートする〜asorti
asorti
asorti(対象配列, ソート結果配列, ソート方法)
2.sh
4.sh
#!/bin/bash
EXAM_SCORE=$(cat << EOF
Ichiro 84
Bob 25
Wakame 76
Hanako 56
Alice 90
Kabao 53
Jam 43
Tarao 25
Cheese 12
Kasuo 47
Piyoko 88
Ikura 29
EOF
)
echo "$EXAM_SCORE" | awk '
function cmp_str_ind(i1, v1, i2, v2) {
if (i1 < i2) {
return -1;
} else {
return 1;
}
}
{
data[$1] = $2;
}
END {
asorti(data, sorted_data, "cmp_str_ind");
for (j in sorted_data) {
printf("sorted_data[%s] = %s\n", j, sorted_data[j]);
}
}
'
$ chmod +x 4.sh
$ ./4.sh
sorted_data[1] = Alice
sorted_data[2] = Bob
sorted_data[3] = Cheese
sorted_data[4] = Hanako
sorted_data[5] = Ichiro
sorted_data[6] = Ikura
sorted_data[7] = Jam
sorted_data[8] = Kabao
sorted_data[9] = Kasuo
sorted_data[10] = Piyoko
sorted_data[11] = Tarao
sorted_data[12] = Wakame
PROCINFO["sorted_in"]
まとめ
PROCINFO["sorted_in"]
asort/asorti関数
参考サイト
記事を書いた人
ナンデモ系エンジニア
主にAngularでフロントエンド開発することが多いです。 開発環境はLinuxメインで進めているので、シェルコマンドも多用しております。 コツコツとプログラミングするのが好きな人間です。
カテゴリー