Removing Mistakes With Git Rebase
Let's say you've got a public repository on GitHub where you're building a basic website hosted on AWS.
Part of your website accesses an AWS service like S3, DynamoDB or SQS, so naturally you may have your AWS access key and secret stored in a config file (we'll talk about WHY YOU SHOULD NEVER DO THIS at the end).
So you're busy coding away, get the website to a stable stage and do a git add -A
then git commit -m "Oh dear!"
.
Suddenly you've checked in your credentials to your local branch. Did you know there are scripts running against GitHub that crawl for AWS credentials? Once they get your details they setup 100's of servers on your account to mine bitcoins. Whilst Amazon to sometimes refund these sort of attacks, by default you are liable for the costs, which can go over $3,000!
So this isn't good, but there is a light at the end of the tunnel; we've not pushed to the remote server yet, so can undo the mistake.
Approach 1: Reset To A Safe State
We could run:
git reset 07784e1
This will remove all commits on our current branch back to commit "07784e1" (assuming this is the last safe commit) but keep our new and modified files. This would allow us to add the config file to .gitignore
or clean out the credentials then check in again.
This works, however if we've got a lot of new commits and the unsafe one happens somewhere in the middle, we may have lost a lot of good context in the commit messages.
Approach 2: Revert The Commit
We could alternatively run:
git revert 14c52c3
This will create a new commit that reverts the changes, unfortunately the credentials are still there, just not at the head revision. Anyone who peeked in the commits will still be able to see them.
Approach 3: Rebase Interactive
Finally, we could instead run:
git rebase -i master
This will bring up a list of the commits made between our local master
branch and the head of the NewFeature
branch.
Here's the magic; rebase effectively allows you to rewrite history from a common ancestory. We can choose to reword the commit messages, change the contents of the commit or squash (combine) multiple commits together to give clearer context in the commit log.
If we had lots of commits with 1 offending commit in the middle, we could keep all of those commits and just edit out the one in the middle. This is supremely useful in a non security breach context too if you wanted to clean up a commit log, get rid of a series of "Revert x, Revert Revert x" style messages etc.
First I'll mark the offending commit with "edit":
This will then take all those commits and re-create them (with new hash ids, we'll talk about that in a sec) on top of the point you picked as a starting point (here it's where we branched off master).
Once it gets to any commits labelled reword
or edit
it stops like so:
Now let's go to our IDE / Text Editor and remove the offending lines from the file.
Then, open git gui with the below command and select "Amend last commit":
git gui
See what I mean about rewriting history? Now we can change this commit to our hearts content. As I've modified out the lines in my Web.config I can add the file back to the commit to get this:
I'll commit that change and return to the command line. Here we run:
git rebase --continue
To move on to the next commit flagged as reword
or edit
. As we have none, the rebase will complete and we can now see:
Our offending commit as been modified so the credentials are no longer present. Now when I push this to remote it won't carry the credentials with it.
Important Notes on Rebase
As previously mentioned, when you rebase the commits since the common ancestory are rewritten and given a new hash id.
You can happily rebase a branch that is already pushed to remote by running:
git rebase origin/master -i
# Do your rebase here
git push -f # Force you history rewrite onto the remote branch, you utter monster!
This will however completely rewrite the history of that particular branch. If other users have pulled that branch and you force changes to it, they will find they'll have to pull the whole branch down again from scratch, messing up any changes they have done.
In general it is only safe to run rebase
if:
- You are rebasing a branch / set of commits you haven't pushed to remote yet
- You are 100% sure no-one has pulled your branch locally
- You have already agreed with any users that they will have to pull the whole thing again and break any changes
Very Important Notes on AWS credentials
So the above example was really just to demonstrate how you can run rebase and where it may be useful.
However it is also indicitive of an underlying issue; there's still nothing stopping me accidentally commiting those credentials again in the future, any maybe that time I'd miss them and push to a public remote ($3,000 remember!)
If you do find yourself storing AWS access keys and secrets in config, instead look at using profiles instead. AWS have a great in depth guide on how to do this here .
A profile is setup on your local environment through the command line or powershell with:
Set-AWSCredentials -AccessKey {Key} -SecretKey {Secret} -StoreAs {MyProfileName}
Then in your config it is referenced by:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<appSettings>
<add key="AWSProfileName" value="MyProfileName"/>
</appSettings>
</configuration>
This way you can happily commit the config, knowing that it uses a profile local to your own machine and is not in your repository.
Next Steps
Atlassian have a great tutorial on rebase with diagrams explaining how it all works.