August 1, 2020
After several steps, we now have the tools to allow
every occurence of \ref
and \pageref
in a comment of a Java
source file to be replaced by the appropriate reference. All that's needed is to
read in the list of possible references, examine the
parsed out comments, and make the replacement.
The entire program is discussed below, with a few remarks about
each piece and how it relates to earlier iterations.
The first part of this program is identical to the first part in the previous step. It loads an existing list of references.
data LabelEntry = LabelEntry { fileName :: String, labelType :: String, latexLabel :: String, section :: String, page :: String } deriving (Eq) instance Show LabelEntry where show (LabelEntry f1 f2 f3 f4 f5) = f1 ++ " " ++ f2 ++ " " ++ f3 ++ " " ++ f4 ++ " " ++ f5 readLabelEntry :: String -> LabelEntry readLabelEntry s = do let xs = words s LabelEntry {fileName = xs !! 0, labelType = xs !! 1, latexLabel = xs !! 2, section = xs !! 3, page = xs !! 4 } -- Given the name for a database file, read it in. readDB :: String -> IO [LabelEntry] readDB fname = do contents <- readFile fname return $ map readLabelEntry $ lines contents
A couple more trivial functions are needed to work with the list of labels:
labelMatches :: LabelEntry -> String -> Bool labelMatches (LabelEntry _ _ comp _ _ ) s = comp == s labelSection :: LabelEntry -> String labelSection (LabelEntry _ _ _ s _) = s labelPage :: LabelEntry -> String labelPage (LabelEntry _ _ _ _ s) = s
The Java source code parser is identical to
what appeared earlier, so there's no reason to present all of it here again. The important
point is that parseJava
converts Java source code to a list of
ParsedJava
values:
data ParsedJava = SLComment String | MLComment String | JavaCode String | WhiteSpace String parseJava :: String -> Either ParseError [ParsedJava] parseJava input = parse parseJavaInput "" input
The earlier parser can be used as-is, but that parser doesn't parse anything inside the comments it identifies. The comments need some further parsing to do a find-and-replace. There are three commands to look for, all of which work just as they do in LaTeX.
If \ref{label}
occurs somewhere in a Java comment, then the program looks up
label
in the list of LabelEntry
values, and replaces
\ref{label}
with the corresponding section
value from the
matching LabelEntry
. This field is called section
because
that's usually what it is – a section number, like "4.3.2" – although it may be
an equation number, figure number, etc.
\pageref
works the same way as \ref
, except that
\pageref{label}
is replaced by the page
value from the
corresponding LabelEntry
.
Use the \verb
(short for "verbatim") command to turn off find-and-replace.
The first character after \verb
is used to terminate the run of verbatim input.
So
\verb|Don't parse \ref{something-or-other} please|
will pass through the parser and come out the other end as
Don't parse \ref{something-or-other} please
Here's a parser that finds and replaces the three possible commands:
parseAndReplace :: String -> [LabelEntry] -> Either ParseError String parseAndReplace input defs = runParser replaceComment defs "" input replaceComment :: GenParser Char [LabelEntry] String replaceComment = do x <- many1 (notMacro <|> isMacro <|> falseMacro) return $ concat x notMacro :: GenParser Char [LabelEntry] String notMacro = many1 $ noneOf "\\" isMacro :: GenParser Char [LabelEntry] String isMacro = verbMacro <|> refMacro <|> pagerefMacro verbMacro :: GenParser Char [LabelEntry] String verbMacro = do try $ string "\\verb" x <- anyChar guts <- many $ noneOf [x] void $ char x return guts refMacro :: GenParser Char [LabelEntry] String refMacro = do try $ string "\\ref{" guts <- many $ noneOf ['}'] void $ char '}' defs <- getState let xs = filter (\t -> labelMatches t guts) defs if (null xs) then return "UNDEFINED" else return $ labelSection $ head xs pagerefMacro :: GenParser Char [LabelEntry] String pagerefMacro = do try $ string "\\pageref{"\ guts <- many $ noneOf ['}'] void $ char '}' defs <- getState let xs = filter (\t -> labelMatches t guts) defs if (null xs) then return "UNDEFINED" else return $ labelPage $ head xs falseMacro :: GenParser Char [LabelEntry] String falseMacro = do void $ char '\\' return ['\\']
Use parseAndReplace
on the contents of each comment. The function returns
the same comment, after making any replacments.
Making the parser stateful is relatively easy. Instead of calling parse
to
invoke Parsec, use runParser
, and provide the state. In this case, the
state is fixed, and it's just a [LabelEntry]
value. The functions
refMacro
and pagerefMacro
use Parsec's getState
function to access this state data.
Finally, the top level, with main
:
output :: [LabelEntry] -> ParsedJava -> String output defs (JavaCode s) = s output defs (WhiteSpace s) = s output defs (SLComment s) = replaceMacro defs s output defs (MLComment s) = replaceMacro defs s replaceMacro :: [LabelEntry] -> String -> String replaceMacro defs s = do let x = parseAndReplace s defs case x of Left err -> show(err) Right valid -> valid argsValid :: [String] -> IO Bool argsValid names = do if (null names) || (length names /= 2) then return False else doFilesExist names doFilesExist :: [String] -> IO Bool doFilesExist names = allM (\s -> doesFileExist s) names main = do args <- getArgs argsValid args >>= \case False -> putStrLn "Give me the labels file and the Java file." True -> do macroDefs <- readDB (args !! 0) contents <- readFile (args !! 1) let rawParse = parseJava contents case rawParse of Left err -> putStrLn (show(err)) Right valid -> mapM_ putStr $ map (output macroDefs) valid