Inserting LaTeX References into Java

(with a stateful parser)

August 1, 2020

After several steps, we now have the tools to allow every occurence of \ref and \pageref in a comment of a Java source file to be replaced by the appropriate reference. All that's needed is to read in the list of possible references, examine the parsed out comments, and make the replacement. The entire program is discussed below, with a few remarks about each piece and how it relates to earlier iterations.

The first part of this program is identical to the first part in the previous step. It loads an existing list of references.

 
data LabelEntry = LabelEntry {
  fileName :: String,
  labelType :: String,
  latexLabel :: String,
  section :: String,
  page :: String
} deriving (Eq)

instance Show LabelEntry where
  show (LabelEntry f1 f2 f3 f4 f5) = f1 ++ " " ++ f2 ++ " " ++ f3 ++ " " ++
    f4 ++ " " ++ f5

readLabelEntry :: String -> LabelEntry
readLabelEntry s = do
  let xs = words s
  LabelEntry {fileName = xs !! 0,
            labelType = xs !! 1,
            latexLabel = xs !! 2,
            section = xs !! 3,
            page = xs !! 4
            }  

-- Given the name for a database file, read it in.
readDB :: String -> IO [LabelEntry]
readDB fname = do
  contents <- readFile fname
  return $ map readLabelEntry $ lines contents

A couple more trivial functions are needed to work with the list of labels:

labelMatches :: LabelEntry -> String -> Bool
labelMatches (LabelEntry _ _ comp _ _ ) s = comp == s

labelSection :: LabelEntry -> String
labelSection (LabelEntry _ _ _ s _) = s

labelPage :: LabelEntry -> String
labelPage (LabelEntry _ _ _ _ s) = s

The Java source code parser is identical to what appeared earlier, so there's no reason to present all of it here again. The important point is that parseJava converts Java source code to a list of ParsedJava values:

data ParsedJava = SLComment String | MLComment String | 
JavaCode String | WhiteSpace String

parseJava :: String -> Either ParseError [ParsedJava]
parseJava input = parse parseJavaInput "" input

The earlier parser can be used as-is, but that parser doesn't parse anything inside the comments it identifies. The comments need some further parsing to do a find-and-replace. There are three commands to look for, all of which work just as they do in LaTeX.

If \ref{label} occurs somewhere in a Java comment, then the program looks up label in the list of LabelEntry values, and replaces \ref{label} with the corresponding section value from the matching LabelEntry. This field is called section because that's usually what it is – a section number, like "4.3.2" – although it may be an equation number, figure number, etc.

\pageref works the same way as \ref, except that \pageref{label} is replaced by the page value from the corresponding LabelEntry.

Use the \verb (short for "verbatim") command to turn off find-and-replace. The first character after \verb is used to terminate the run of verbatim input. So

\verb|Don't parse \ref{something-or-other} please|

will pass through the parser and come out the other end as

Don't parse \ref{something-or-other} please

Here's a parser that finds and replaces the three possible commands:

parseAndReplace :: String -> [LabelEntry] -> Either ParseError String
parseAndReplace input defs = runParser replaceComment defs "" input

replaceComment :: GenParser Char [LabelEntry] String
replaceComment = do
    x <- many1 (notMacro <|> isMacro <|> falseMacro)
    return $ concat x

notMacro :: GenParser Char [LabelEntry] String
notMacro = many1 $ noneOf "\\"

isMacro :: GenParser Char [LabelEntry] String
isMacro = verbMacro <|> refMacro <|> pagerefMacro

verbMacro :: GenParser Char [LabelEntry] String
verbMacro = do
    try $ string "\\verb"
    x <- anyChar
    guts <- many $ noneOf [x]
    void $ char x
    return guts

refMacro :: GenParser Char [LabelEntry] String
refMacro = do
    try $ string "\\ref{"
    guts <- many $ noneOf ['}']
    void $ char '}'
    defs <- getState
    let xs = filter (\t -> labelMatches t guts) defs    
    if (null xs)
        then return "UNDEFINED"
        else return $ labelSection $ head xs

pagerefMacro :: GenParser Char [LabelEntry] String
pagerefMacro = do
    try $ string "\\pageref{"\
    guts <- many $ noneOf ['}']
    void $ char '}'
    defs <- getState
    let xs = filter (\t -> labelMatches t guts) defs
    if (null xs)
        then return "UNDEFINED"
        else return $ labelPage $ head xs

falseMacro :: GenParser Char [LabelEntry] String
falseMacro = do
    void $ char '\\'
    return ['\\']

Use parseAndReplace on the contents of each comment. The function returns the same comment, after making any replacments.

Making the parser stateful is relatively easy. Instead of calling parse to invoke Parsec, use runParser, and provide the state. In this case, the state is fixed, and it's just a [LabelEntry] value. The functions refMacro and pagerefMacro use Parsec's getState function to access this state data.

Finally, the top level, with main:

output :: [LabelEntry] -> ParsedJava -> String
output defs (JavaCode s) = s
output defs (WhiteSpace s) = s
output defs (SLComment s) = replaceMacro defs s
output defs (MLComment s) = replaceMacro defs s

replaceMacro :: [LabelEntry] -> String -> String
replaceMacro defs s = do
    let x  = parseAndReplace s defs
    case x of
        Left err -> show(err)
        Right valid -> valid

argsValid :: [String] -> IO Bool
argsValid names = do
    if (null names) || (length names /= 2)
        then return False
        else doFilesExist names

doFilesExist :: [String] -> IO Bool
doFilesExist names = allM (\s -> doesFileExist s) names

main = do
    args <- getArgs
    argsValid args >>= \case
        False -> putStrLn "Give me the labels file and the Java file."
        True -> do
            macroDefs <- readDB (args !! 0)
            contents <- readFile (args !! 1)
            let rawParse = parseJava contents
            case rawParse of
              Left err -> putStrLn (show(err)) 
              Right valid -> mapM_ putStr $ map (output macroDefs) valid 

Prev

Contact

Next