This time, we will approach Scala through a different angle: that of a script file. If you see Scala solely as a ‘normal’ programming language, this should demonstrate that Scala offers more possibilities than you knew.
The code I’ll show bellow was not created only for this post. It was actually used to solve a small problem in a project. This means that, even if it is not as nice and beautiful as it could be, it did the job well.
First. lets describe what we are trying to do here. We had several text files that needed a certain pattern to be found and replaced by something else. Of course, this job could be done manually… but with tens of files, it would take a while… and would be very error prone. Another option would be to write a shell script to solve the problem. But we don’t have anyone in the team that are really good shell scripters.
So I took the opportunity to solve the problem in Scala, with a small script, that would scan all the files, one by one, search for the pattern and replace it with the new value.
To write a Scala script, you just type the commands directly in the script file, instead of putting them inside a class. So the code consist basically of some commands, followed by some function definitions that are used by these commands. Here is the source code, and some explanation after it:
import java.io._ import scala.util.matching.Regex import Regex._ // entry point for the script execution processDir(new File(".")) def processDir(dir: File) { for (f <- dir.listFiles) { if (f.isDirectory) { processDir(f) } else { val extension = f.getName.lastIndexOf('.') match { case -1 => None case x:Int => new Some(f.getName.substring(x + 1).toLowerCase) } if (List(Some("java"), Some("sql")) contains extension) { processFile(f) } } } } def processFile(file: File) { val fileContents = getFileContents(file) if (fileContents != null) { val regex = "\u[0-9a-fA-F]{4}".r val newFileContents = regex.replaceAllIn(fileContents, m => replaceUnicode(m.matched)) if (fileContents != newFileContents) { val newFile = new File(file.getName + ".conv") val writer = new BufferedWriter(new FileWriter(newFile)) writer.write(newFileContents, 0, newFileContents.length) writer.close file.delete newFile.renameTo(file) } } } def getFileContents(file: File) = { val reader = new BufferedReader(new FileReader(file)) var line = reader.readLine var content = line while (line != null) { line = reader.readLine if (line != null) { content += 'n' + line } } reader.close content } def replaceUnicode(s: String) = { val unicode = s.substring(2) val unicodeChars = new Array[Char](1) unicodeChars(0) = Integer.parseInt(unicode, 16).asInstanceOf[Char] new String(unicodeChars) }
First interesting thing to noticing is that we are leveraging a lot of already existent Java API knowledge – mainly File stuff from java.io. Besides that, the code is pretty simple, containing basically some function definition and passing, and some recursion here and there.
In line 6 we start the script execution by calling the function processDir on the current directory. An interesting part of this function is the pattern matching at the lines 13-16 where we get the current file extension, which we will use in the if statement that follows. There, we simply ignore files that have no extension (i.e. got defined as None), or are not .sql or .java.
For the correct files, processFile is then called. From here, all that happens is basic Java IO stuff, except for one interesting code on line 29. Here, for each string that is matched by the regex, we replace it by a new version, generate by the replaceUnicode function. This is probably the most interesting line in the entire script.
Pingback: Tweets that mention A Scala script for processing files « JCranky's Blog! -- Topsy.com
Thanks for the post. One nifty thing is to mark getFileContents as implicit, then you can use the file as if it were the string containing the contents.
One nifty thing is to mark your `getContentsOfFile(file: File): String` method as implicit, then you can simply do regex.replaceAllIn(file …)
Thanks for the post 🙂
Good point. Just extra care would be needed due to the null checking, perhaps returning Option[String] instead.