Shower Thought: Don't write Bash, generate it!

24 Nov, 2024

While we're just on a Saturday night posting spree, here's another old post that I'm dusting off.

This is another one of those that UNIX oldheads will probably be rolling there eyes at, because I imagine at least one person somewhere has done this before and it's completely obvious to them. But, it wasn't obvious to me.

I had a problem the other day where I needed to move multiple git repositories to a different server. The problem was this, then:

Find a git repository in an arbitrarily deep folder structure.
Check if the git repo's origin was at git.example.com.
If so, replace the remote with git2.example.com.
Keep going.

This seemed like a task that should not be hard in Bash. And really, it isn't. All I have to do is...

find . -type d -name ".git" -exec git remote set-url origin ...

But wait, git commands need to be run from the repository folder. And also this won't let me filter out only repos from example.com. Sigh.

Ok, let's try a loop:

find . -type d -name ".git" -print0 | while read -d $'\0' -r repo; do
    cd "$repo/..";
    ORIGIN=$(git remote get-url origin);
    if [[ "$ORIGIN" ~= ".*git.example.com.*" ]]; then
        git remote set-url origin $(echo $repo | sed 's|git|git2|' );
    fi
    cd -;
done

But wait, what's this?

zsh: condition expected: ~=

Sigh, alright.

# ...
    if [[ "$ORIGIN" =~ ".*git2.example.com.*" ]]; then
        git remote set-url origin $(echo $repo | sed 's|git|git2|' );
# ...

But wait, now the first cd is failing, because of... permissions errors? What? Ok, not sure how that happened, but when it does, the git commands fail and then the final cd - pops us out of the directory tree, ruining all subsequent commands. Try again...

find . -type d -name ".git" -print0 | while read -d $'\0' -r repo; do
    cd "$repo/.." || continue;
# ...
done

Ok, but now the call to git remote is failing because some repos don't have a remote called origin. Ugh. Maybe what I need to do is this:

find . -type d -name ".git" -print0 | while read -d $'\0' -r repo; do
    cd "$repo/..";
    git remote -v | IFS=$'\n' while read -r remote; do
        name=$(echo $remote | awk '{print $1}')
        url=$(echo $remote | awk '{print $2}')
        if [[ "$url" ~= ".*git.example.com.*" ]]; then
            # ...

You know what, this is disgusting. Maybe we just... I dunno, rewrite it in Perl?

At this point I did some Googling around and discovered the -C flag of git. That's going to help a ton, but I was still really unhappy with how ugly this looked.

Then I got an idea. Typing in all these commands by hand would be really easy, it would just take a long time. Is there someway I can emulate that? Of course, that's what a script is. Just because it's a script doesn't mean it has to be perfectly generic and re-usable.

First, I got all the names of the repositories and stuffed them in a file:

find "$PWD" -type d -name ".git" > /tmp/repos

Then, cut off the trailing .git, and add a cd command.

sed -i 's|/.git$||' /tmp/repos
sed -i 's|^| cd|' /tmp/repos

At this point we have a file that looks like:

cd ~/Work/repos/repo1
cd ~/Work/repos/repo2
...

At this point I opened my editor, selected every line, added a new cursor at the end of every line, and tacked on an extra command as a suffix:

cd ~/Work/repos/repo1; git remote -v | sed "s|^|$PWD\t\t|";
cd ~/Work/repos/repo2; git remote -v | sed "s|^|$PWD\t\t|";
...

We now have an unrolled bash loop, if you will. Executing this we place the output into a new file, adding a filter:

bash /tmp/repos > /tmp/repos2 | grep git.example.com

Which now contains the output:

/home/drew/Work/repos/repos1        origin git@git.example.com:repos/repos1.git
/home/drew/Work/repos/repos2        origin git@git.example.com:repos/repos2.git
...

From there I can open the repos2 file and easily use multi-cursor editing to get this:

git -C /home/drew/Work/repos/repos1 remote set-url origin git@git{,2}.example.com:repos/repos1.git;
git -C /home/drew/Work/repos/repos2 remote set-url origin git@git{,2}.example.com:repos/repos2.git;
...

Save, run bash /tmp/repos2, and it completes the original task. Even if there are errors, they are just safely ignored and it keeps chugging.

Now... I don't actually know if this was 'faster' than just writing the loop or not. Maybe a better programmer, or at least one more familiar with Bash, would find this cumbersome. But it seems like writing simple bash commands to print out complicated bash commands to a new script can be easier than writing the complicated bash commands.

For me the benefits are:

I don't have to remember bash syntax or ideosyncracies as much. Loops, who needs 'em?
I can easily see the commands as I'm editing them. I can even see the result of commands in-between steps.
If something goes wrong I can easily repeat only one part of the loop by deleting or commenting lines from the script file.
If I somehow missed a repo because any of the filtering is bad, I can copy and paste a line and then edit it with the intended values.
There's no tomfoolery with nested command languages, such as with find ... -exec 'bash -c ...'.
Since everything I did (mostly) is still a script, there's potential for me to copy out of a shell history and make it repeatable. The multi-cursor tricks could be replaced with something like awk or sed and a little elbow grease.

Like I said, I feel a little sheepish writing this because every time I 'discover' some new feature or trick with Bash, at least one of my friends is like "LOL you didn't know about that? It's on page 1,703 of the GNU User Manual" and then I feel dumb. But in case someone somewhere reads this and thinks it's a good idea, well, I hope it helps.