A Scala script for processing files

This time, we will approach Scala through a different angle: that of a script file. If you see Scala solely as a ‘normal’ programming language, this should demonstrate that Scala offers more possibilities than you knew.

The code I’ll show bellow was not created only for this post. It was actually used to solve a small problem in a project. This means that, even if it is not as nice and beautiful as it could be, it did the job well.

First. lets describe what we are trying to do here. We had several text files that needed a certain pattern to be found and replaced by something else. Of course, this job could be done manually… but with tens of files, it would take a while… and would be very error prone. Another option would be to write a shell script to solve the problem. But we don’t have anyone in the team that are really good shell scripters.

So I took the opportunity to solve the problem in Scala, with a small script, that would scan all the files, one by one, search for the pattern and replace it with the new value.

To write a Scala script, you just type the commands directly in the script file, instead of putting them inside a class. So the code consist basically of some commands, followed by some function definitions that are used by these commands. Here is the source code, and some explanation after it:

import java.io._
import scala.util.matching.Regex
import Regex._

// entry point for the script execution
processDir(new File("."))

def processDir(dir: File) {
 for (f <- dir.listFiles) {
  if (f.isDirectory) {
   processDir(f)
  } else {
   val extension = f.getName.lastIndexOf('.') match {
    case -1 => None
    case x:Int => new Some(f.getName.substring(x + 1).toLowerCase)
   }

   if (List(Some("java"), Some("sql")) contains extension) {
    processFile(f)
   }
  }
 }
}

def processFile(file: File) {
 val fileContents = getFileContents(file)
 if (fileContents != null) {
  val regex = "\u[0-9a-fA-F]{4}".r
  val newFileContents = regex.replaceAllIn(fileContents, m => replaceUnicode(m.matched))

  if (fileContents != newFileContents) {
   val newFile = new File(file.getName + ".conv")
   val writer = new BufferedWriter(new FileWriter(newFile))

   writer.write(newFileContents, 0, newFileContents.length)
   writer.close

   file.delete
   newFile.renameTo(file)
  }
 }
}

def getFileContents(file: File) = {
 val reader = new BufferedReader(new FileReader(file))

 var line = reader.readLine
 var content = line
 while (line != null) {
  line = reader.readLine
  if (line != null) {
   content += 'n' + line
  }
 }

 reader.close
 content
}

def replaceUnicode(s: String) = {
 val unicode = s.substring(2)
 val unicodeChars = new Array[Char](1)
 unicodeChars(0) = Integer.parseInt(unicode, 16).asInstanceOf[Char]

 new String(unicodeChars)
}

First interesting thing to noticing is that we are leveraging a lot of already existent Java API knowledge – mainly File stuff from java.io. Besides that, the code is pretty simple, containing basically some function definition and passing, and some recursion here and there.

In line 6 we start the script execution by calling the function processDir on the current directory. An interesting part of this function is the pattern matching at the lines 13-16 where we get the current file extension, which we will use in the if statement that follows. There, we simply ignore files that have no extension (i.e. got defined as None), or are not .sql or .java.

For the correct files, processFile is then called. From here, all that happens is basic Java IO stuff, except for one interesting code on line 29.  Here, for each string that is matched by the regex, we replace it by a new version, generate by the replaceUnicode function. This is probably the most interesting line in the entire script.

Advertisements
This entry was posted in scala and tagged , , , , , , , . Bookmark the permalink.

4 Responses to A Scala script for processing files

  1. Pingback: Tweets that mention A Scala script for processing files « JCranky's Blog! -- Topsy.com

  2. weezybizzle says:

    Thanks for the post. One nifty thing is to mark getFileContents as implicit, then you can use the file as if it were the string containing the contents.

  3. weezybizzle says:

    One nifty thing is to mark your `getContentsOfFile(file: File): String` method as implicit, then you can simply do regex.replaceAllIn(file …)

    Thanks for the post 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s