.aux
FilesJuly 31, 2020
These programs are moving toward to goal of allowing references from Java source code to LaTeX documents. The short program below gathers the necessary information from the LaTeX documents.
LaTeX is a wonderful thing, and if you're not familiar with it, then you
should be!
LaTeX also happens to be the format used for "block-style" (.lhs
)
literate Haskell files.
When composing a LaTeX document, the \label
command allows you to mark some
location so that you can refer to it later with \pageref
or \ref
.
For example, if the document contains \label{i-care-about-this}
, then
\pageref{i-care-about-this}
will appear in the output file as "8" or "452", or
whatever the page number where the \label
was defined. Exactly what
\ref{i-care-about-this}
becomes depends on the context in which the \label
was given. Usually, it's a section,
as in "Section (3.1.2)," but it may also be an equation number, a figure number, or a number in an
enumerated list.
The LaTeX document generation system works like a two-pass compiler. The first pass generates
an .aux
file that contains the \label
values (and other information), and
the second pass uses the .aux
information to fill in any \ref
or
\pageref
values. Here's a short excerpt from an .aux
file:
\relax \@writefile{toc}{\contentsline {section}{\numberline {1}Introduction}{1}} \newlabel{becomes-a-note}{{2}{2}} \newlabel{goals-and-concerns}{{4}{8}} \@writefile{toc}{\contentsline {subsection}{\numberline {5.3}Implementation}{13}} \newlabel{five-atomic-steps}{{5.4.2}{14}} \@writefile{toc}{\contentsline {subsubsection}{\numberline {5.4.7}Global Undo}{19}}
All kinds of stuff can appear in an .aux
file; e.g., the lines above that start
with \@writefile
have to do with generating a table of contents.
For the current task, the only lines we care about start with \newlabel
. For example, the line
\newlabel{five-atomic-steps}{{5.4.2}{14}}
means that the original document had \label{five-atomic-steps}
somewhere in Section (5.4.2), which was on page 14. Those two pieces of information, and the
label itself, are what we care about.
.aux
File
The program below parses out the three parts of a \newlabel
,
and folds them into a larger "aux database" file. The idea is to run the program
on a series of .aux
files to create a single file that contains the
label definitions from all of the files. The aux database file is just a flat file;
it consists of a series of lines, each of which has the format
file_name label_type latex_label section page
where latex_label
, section
and page
are taken directly from the
.aux
file, file_name
is the name of the file from which
these values came, and label_type
doesn't serve any purpose yet (it's
there for future-proofing).
The entire program is available here, and each part is described below. First, the aux database file is described by
data LabelEntry = LabelEntry { fileName :: String, labelType :: String, latexLabel :: String, section :: String, page :: String } deriving (Eq) instance Show LabelEntry where show (LabelEntry f1 f2 f3 f4 f5) = f1 ++ " " ++ f2 ++ " " ++ f3 ++ " " ++ f4 ++ " " ++ f5 readLabelEntry :: String -> LabelEntry readLabelEntry s = do let xs = words s LabelEntry {fileName = xs !! 0, labelType = xs !! 1, latexLabel = xs !! 2, section = xs !! 3, page = xs !! 4 }
and this "database" is read into memory by
-- Given the name for a database file, read it in. readDB :: String -> IO [LabelEntry] readDB fname = do contents <- readFile fname return $ map readLabelEntry $ lines contents
The individual .aux
files are read into memory and parsed by
-- Given an .aux file name, parse the file. readAuxFile :: String -> IO [LabelEntry] readAuxFile fname = do contents <- readFile fname -- Parse out the items we care about. let rawEntries = -- Get rid of the blank items. map (filter (\s -> length s > 0)) $ -- Each line of input becomes a list of strings, some of which -- are blank or empty. map parseItem $ -- Drop the first 10 characters from each line (i.e., "\newlabel{") map (drop 10) $ -- Only those lines that start with "\newlabel". filter (\s -> isPrefixOf "\\newlabel" s) $ (lines contents) return $ map (rawToLabelDB fname "arbitrary") rawEntries -- Breaks a line from an .aux file into the three pieces we care about. It is -- assumed that we have stringXstringX, etc., where X is some combination of -- '{' and '}'. We want to split on any combination of these. This generates -- lots of empty strings which need to be filtered out. parseItem :: String -> [String] parseItem "" = [] parseItem s = firstString : (parseItem rest) where firstString = takeWhile (\c -> (c /= '{') && (c/= '}')) s rest = drop (length firstString + 1) s -- Convert already parsed data from an .aux file to a LabelEntry value. rawToLabelDB :: String -> String -> [String] -> LabelEntry rawToLabelDB fname labelType xs = LabelEntry { fileName = fname, labelType = labelType, latexLabel = xs !! 0, section = xs !! 1, page = xs !! 2 }
Pulling it all together, main
takes two file names as command-line
arguments: the database file and an .aux
file. It reads the data from
each of these files, combines it, and overwrites the database file with the combined
data.
main = do args <- getArgs argsValid args >>= \case False -> putStrLn "Provide database file, then aux file." True -> do knownReferences <- readDB (args !! 0) newReferences <- readAuxFile (args !! 1) let combinedReferences = union newReferences knownReferences -- Careful here since Haskell's lazy IO can cause problems -- reading/writing to the same file. One way to deal with this -- would be to write to a temporary file, then copy to the final -- destination. That's safest since nothing is lost in a crash. -- Another way (done here) is to do something that requires that -- combinedReferences is complete. let totSize = length combinedReferences when (totSize > 0) $ writeFile (args !! 0) (unlines $ map show combinedReferences) putStrLn $ "total number of labels: " ++ show totSize argsValid :: [String] -> IO Bool argsValid names = do if (null names) || (length names /= 2) then return False else doFilesExist names doFilesExist :: [String] -> IO Bool doFilesExist names = allM (\s -> doesFileExist s) names