normalization-insensitive

Posted on August 11, 2016
Tags: haskell, software

Earlier this year, I went on a trip to Texas. On my Android phone, I ended up with photos with names like 20160515_201343_East César E. Chávez Boulevard.jpg. When I used the script I had written to transfer the photos to my computer, I discovered that it was re-copying some of the photos every time I ran the script, instead of recognizing that these photos had already been copied. Specifically, it was re-copying the photos that had accent marks.

This comes down to a question of Unicode normalization. Linux (and thus Android) uses NFC in its filesystem, and Mac OS X uses NFD in its filesystem, so when comparing the filenames on the phone and the filenames on the laptop, the filenames with accents never matched.

What I wanted was something like case-insensitive, but for normalization, not case. So, I fulfilled this need by “writing” the normalization-insensitive package. (I put “writing” in quotes, because all I did was make some straightforward changes after copying the case-insensitive package wholesale.)

Under the hood, normalization-insensitive uses the unicode-transforms package. This avoids having to pull in any heavyweight dependencies like ICU.

Here is the new version of the android-photos script:

#!/usr/bin/env runhaskell

import Data.List
import System.Directory
import System.IO
import System.Process

import           Data.Unicode.NormalizationInsensitive  ( NI )
import qualified Data.Unicode.NormalizationInsensitive as NI

remoteDir = "/storage/extSdCard/DCIM/Camera"
localDir = "/Users/ppelleti/Pictures/Android"


getRemoteFiles :: String -> IO [String]
getRemoteFiles remoteDir = do
  files <- readProcess "adb" ["shell", "ls", remoteDir] ""
  return $ lines $ filter (/= '\r') files

getLocalFiles :: String -> IO [String]
getLocalFiles localDir = do
  files <- getDirectoryContents localDir
  return $ filter shouldKeep files
  where shouldKeep ('.':_ ) = False
        shouldKeep _ = True

copyOneFile :: String -> String -> IO ()
copyOneFile remoteFile localFile =
  cmd ["adb", "pull", "-a", remoteFile, localFile]

cmd :: [String] -> IO ()
cmd (exe:args) = do
  putStrLn $ unwords $ exe : map show args
  callProcess exe args

copyFiles :: String -> String -> [String] -> IO ()
copyFiles rDir lDir files =
  mapM_ cfile files
  where cfile file = copyOneFile (rDir ++ "/" ++ file) (lDir ++ "/" ++ file)

copyNewFiles :: String -> String -> IO ()
copyNewFiles rDir lDir = do
  rFiles <- map NI.mk <$> getRemoteFiles rDir
  lFiles <- map NI.mk <$> getLocalFiles lDir
  let files = rFiles \\ lFiles
  copyFiles rDir lDir $ map NI.original files

main = copyNewFiles remoteDir localDir