Regular expressions in Atom for code consistency

When we start working in a project that was previously developed by another company, we could fall in the disgrace that the code was written by more than one developer without agreeing in the same practices, something that we can fix with some regular expressions (regex) within Atom.

In the examples below I decided how the code should looks like so the regular expressions are for my own purposes, but these regex could be adapted to the practices that your company follows.

Here’s a piece of code with some examples:

There are some things that are in the code that shouldn’t be there:

  • The doc comment at the top isn’t following the correct format  /** */
  • The class starts with the name glued with the bracket  MyClass{
  • The same thing happen when instantiating the attribute i  i=0; .
  • The if/else inside the for loop doesn’t have brackets.

This class file probably is not the only one across the application with bad practices, so we need to find a way to replace all of these bad habits with a correct form as easiest as possible.

The first thing we need to do in order to start fixing with regex in Atom is press Ctrl + F (Cmd + F in Mac). It will prompt you for 2 strings, the first is the one you are finding in the file (in our case, our find string will be a regular expression), and the second is the supposed replacement for that string.

We need to make sure that we enabled the regex functionality and case sensitive options:

And that’s it. We are ready to start finding and replacing.

An example: fixing the doc comment with regex

Match (before):

Find\/\/-+\n(\s{4}|\t)?\/\/\s?([\w\d\s\.]+)\n(\s{4}|\t)?\/\/-+
Replace/**\n$1 * $2\n$1 */

Result:

I’ll explain part by part of the regex used here:

\/\/ We are looking for something that starts with a double slash.

-+ Then a chain of consecutive dashes (doesn’t matter how many of them).

\n Followed by a new line.

(\s{4}|\t)? We start another line with an optional 4 spaces or tab indentation (in this case it will be omitted).

\/\/\s? After the optional indentation we will have another double slash followed by an optional space (could be glued with the comment slashes).

([\w\d\s\.]+) Non optional group that corresponds to the explanation text in the doc comment. It’s locked by parentheses to extract the text for the replacement.

(?:\s{4}|\t)?//-+ To finish another round of optional spaces, double slashes and a consecution of dashes. In this case we added a  ?: just to avoid returning an extra group (which will be the same as the first).

The replacement string is quite interesting:

/**\n We start replacing the matched string with the correct form of a doc comment, followed by a new line.

$1 * $2\n In our previous regex we have 3 different matching groups (the ones locked by parentheses), which are returned by the regex as $ variables, followed by a number. The first group  $1 corresponds to the  (\s{4}|\t)? group, which is the optional space. We are re-adding the space to keep old indentations. After that we need to add a literal space, an asterisk *  and the  $2 which corresponds to the second group  ([\w\d\s\.]+) which is the explanation text, and the forced new line at the end.

$1 */ All doc comments end up with an asterisk followed by a slash. In this case we need the  $1 again to add the previous indentation.

Some other examples with regex replacement:

These replacements explained below are more simpler than the above, but they are good for large script files.

Replacing concat equals (=), (i.e.: a=2):

Match (before):  public Integer i=0;
Regex:  (\S)\=(\S)
Replace:  $1 = $2
Result:  public Integer i = 0;

Removing glued brackets:

Match (before):  MyClass{
Regex:  (\S)\{ (any character but a space followed by a literal bracket)
Replace:  $1 {
Result:  MyClass {

Replacing glued parentheses in for/if/while:

Match (before):  for(Account a...
Regex:  [\s\t]?(for|if|while)\(
Replace:  $1 (
Result:  for (Account a...

Adding curly brackets to if/else statements

This replacement if a bit complex than the others, because it needs to follow  more than one replacement step.

Match before replacement:

  1. Adding curly brackets to the if: regexif\s?\(([\w\.\s\=\<\>\!\']+)\)\s?\n(\s+|\t)([^\{\s]) , replace: if ($1) {\n$2$3
  2. Adding curly brackets to the else: regex: \n(\s+|\t)else\s?[^\{](\s+|\t)?([^\{]\s?.+)\n , replace\n$1} else { \n$2$3$1\n$1}\n

After running those 2 steps, we must have fixed the missing curly brackets in our if/else statements, completing with that all the fixes for this file.

Match after replacement:

Now we have a clean code without those bad habits provoked by previous developers:

With a bit of practice we could use it to make the code consistent, but we need to be very careful when doing so because it could break the code if the regex is not the appropriate. I will probably recommend that you do it file by file, at least with the most used ones.

Posted in:

Leave a Reply

Your email address will not be published. Required fields are marked *