HIW 2020 - Exactprint in GHC

Goals of an Exact Printer

  • Accurately reproduce the source code, from the ParsedSource
  • More important, allow changes to be made
    • purely as transformations of the AST,
    • such that when printed out the source still reflects the original as much as possible
    • and the code still compiles.

Complications

  • Haskell is layout sensitive
    • moving things around
    • changing names to be longer or shorter

      baz = bar where x = 1
                      bar = 2
      
  • order in lists - SrcSpans are no longer meaningful
  • comments

ghc-exactprint

  • Initial commit Sep 2014
  • Story starts earlier, getting rid of the panic "blah" entries in the GHC AST
  • But the AST did not actually reflect the parsed source
    • information loss during initial parsing
    • Literals.
  • By GHC 7.10 the ParsedSource accurately reflected the state of the original source

ghc-exactprint is successful

  • Many tools built on top of it
    • HaRe (now dormant)
    • apply-refact (for hlint, now moved to GHC AST too)
    • multiple formatters
    • retrie
    • others

But

  • There are problems
    • Annotations added and maintained in GHC, but the library using them is separate
    • ghc-exactprint is only updated when release time gets near
  • This makes it hard for GHC developers to check changes to API Annotations

Proposed Solution

  • Move the exact printing into the GHC source tree
    • Culmination of multi-year effort
    • Started with getting Trees That Grow into the AST

Work has started

  • There has been some sporadic discussion about this on ghc gitlab.
  • But some things have changed
    • parser cleanup / reorganise
    • recent XRec ping-pong landed
  • Note:
    • This is a proposal, for discussion
    • It is based on actual work, to see real world implications of the decisions

Technical approach

  • Two main goals
    • Get rid of the SrcSpan indexing, by moving the annotations into the ParsedSource
    • Make the annotations properly typed

      type ApiAnnKey = (SrcSpan, AnnKeywordId)
      type ApiAnns = ( Map ApiAnnKey [SrcSpan]
                     , Map SrcSpan [Located AnnotationComment])
      data AnnKeywordId
          = AnnAnyclass -- 'anyclass'
          | AnnAs       -- 'as'
          | AnnAt       -- 'at'
      

How does it work for ghc-exactprint?

  • Input: ParsedSource, and separate annotations
  • Convert the original annotations into a delta format, which tracks the annotation location wrt preceding output.
  • Keep an explicit list of the order of items, so we do not have to use the SrcSpan order, for adding, or moving items around.
  • These are still in a separate data structure

In-tree annotations

  • The annotations are inside the tree, using the TTG extension points, for GhcPs. There is one per AST element.
  • Each annotation has a standard structure part, and a context-sensitive part.

Detail

  • So we have

    data ApiAnn' ann
      = ApiAnn { anchor   :: RealSrcSpan
               , anns     :: ann
               , comments :: [RealLocated AnnotationComment]
               }
      | ApiAnnNotUsed
    
    • all the annotations are used relative to the original anchor, regardless of where it is used in the tree now.

Example

data ApiAnnHsCase = ApiAnnHsCase
      { hsCaseAnnCase :: RealSrcSpan -- 'case' location
      , hsCaseAnnOf   :: RealSrcSpan -- 'of' location
      , hsCaseAnnsRest :: [AddApiAnn]
      }
data AddApiAnn = AddApiAnn AnnKeywordId RealSrcSpan

Attached as

  | HsCase      (XCase p) -- TTG extension point
                (LHsExpr p)
                (MatchGroup p (LHsExpr p))

type instance XCase GhcPs = ApiAnn' ApiAnnHsCase -- TTG usage

Concrete Example

-- 123456789012345
043  case  x  of
044    1 -> True
045    ..
(HsCase
 (ApiAnn
  { (43,3)-(45,14) }                        -- anchor
  (ApiAnnHsCase { 43:3-6 } { 43:12-13 } []) -- anns
  [])                                       -- comments
 (L (SrcSpanAnn (ApiAnnNotUsed) { 43:9 })
  (HsVar .. {OccName: x}))
 (MG
  (NoExtField)
  (L (SrcSpanAnn (ApiAnn { (44,5)-(45,14) ...))))))
(DP (0,0),"case") -- (43, 3)
(DP (0,2),"x")    -- (43, 9)
(DP (0,2),"of")   -- (43,12)
(DP (1,2),"1")    -- (44, 5) wrt (43,3) anchor

Located annotations

  • Some annotations need to apply to all constructors of a data type.
  • They are needed for specific purposes
    • RdrName decorations: `foo`, ':, (&),
    • Contextual usage
      • trailing ,, ;, |
  • We use the fact that AST elements are Located to piggy-back annotations.

XRec Locations

type family XRec p a = r | r -> a
-- | We can strip off the XRec to access the underlying data.
class UnXRec p where
  unXRec :: XRec p a -> a
type instance XRec (GhcPass p) a = Located a
type LHsExpr p = XRec p (HsExpr p)
  • This mimics the "old" scheme where everything is located.
  • For exactprint in GHC we adapt it as

    type instance XRec (GhcPass p) a = GenLocated (Anno a) a
    
    type family Anno a = b
    
  • It is still located, but each AST element has a knob to set precisely what location type to use.

(Thanks Zubin Duggal for helping me with this)

Kinds of location

There is a regular structure for this

data SrcSpanAnn' a = SrcSpanAnn { ann :: a, locA :: SrcSpan }

Example usage

type SrcSpanAnnA    = SrcSpanAnn' (ApiAnn' AnnListItem)
type SrcSpanAnnName = SrcSpanAnn' (ApiAnn' NameAnn)
data AnnListItem
  = AnnListItem {
      lann_trailing  :: [TrailingAnn]
      }
data TrailingAnn
  = AddSemiAnn RealSrcSpan
  | AddCommaAnn RealSrcSpan
  ..

In "normal" usage we can have

type LocatedA = GenLocated SrcSpanAnnA
type LocatedN = GenLocated SrcSpanAnnName

type LocatedAn an = GenLocated (SrcSpanAnn' (ApiAnn' an))

Putting it all together

type LHsExpr p = XRec p (HsExpr p)
type instance Anno (HsExpr (GhcPass p)) = SrcSpanAnnA

foo :: LocatedA (HsExpr GhcPs)
bar :: LHsExpr GhcPs

Note: in instance declarations, you have to use the foo form, which matches the "after resolution" XRec family.

Usage for printing

  • This part is still under heavy development, but enough has been done to indicate viability
  • based heavily on the ghc-exactprint print phase.

Depth-first traversal of the AST

  • Keeps track of a left margin for current indentation level
  • Processes each print operation using the "top left corner" as the reference point.
    • This is the anchor field from earlier
    • implication: there is a "print head" position. It can only move forward. So all annotated items must come to the right or below the anchor.
data Entry = Entry RealSrcSpan [RealLocated AnnotationComment]
           | NoEntryVal
  • comments are handed to the printer, it inserts them into the appropriate place in the output stream (modulo the anchor offset).
    • Aside: comments are allowed to go left of the anchor column, but clip against the left margin.

ExactPrint

class ExactPrint a where
  getAnnotationEntry :: a -> Entry
  exact :: a -> Annotated ()
  • Note:
    • ExactPrint is analogous to Outputable
    • exact is analogous to ppr
  • Printing uses the anchor in the annotation, so the getAnnotationEntry pulls it out if it exists, together with any comments in the span of the item.
  • This anchor is used for an enterAnn routing

Simplest example

instance (ExactPrint a) => ExactPrint (Located a) where
  getAnnotationEntry (L l _) = Entry (realSrcSpan l) []
  exact (L _ a) = markAnnotated a

markAnnotated manages the process of descending into an enclosed AST item.

markAnnotated :: ExactPrint a => a -> Annotated ()
markAnnotated a = enterAnn (getAnnotationEntry a) a

The trivial version of enterAnn, but showing the basic interleaving flow, is

enterAnn :: (ExactPrint a) => Entry -> a -> Annotated ()
enterAnn NoEntryVal a = do
  exact a

The version where there is an EntryVal is

enterAnn (Entry anchor cs) a = do
  addComments cs
  printComments anchor
  off <- gets epLHS
  priorEndAfterComments <- getPos
  let edp = adjustDeltaForOffset
              off (ss2delta priorEndAfterComments anchor)
  let
    st = annNone { annEntryDelta = edp }
  withOffset st (advance edp >> exact a)
withOffset :: Annotation -> (EPP a -> EPP a)
withOffset a =
  local (\s -> s { epAnn = a })

ExactPrint examples

instance ExactPrint (HsTupArg GhcPs) where
  getAnnotationEntry = const NoEntryVal

  exact (Present _ e) = markAnnotated e
  exact (Missing _) = return ()
instance ExactPrint (HsValBindsLR GhcPs GhcPs) where
  getAnnotationEntry = const NoEntryVal

  exact (ValBinds sortkey binds sigs) = do
    applyListAnnotations
       (prepareListAnnotationA (bagToList binds)
     ++ prepareListAnnotationA sigs
       )
prepareListAnnotationA :: ExactPrint (LocatedAn an a)
  => [LocatedAn an a] -> [(RealSrcSpan,EPP ())]
prepareListAnnotationA ls
 = map (\b -> (realSrcSpan $ getLocA b,markAnnotated b)) ls

applyListAnnotations :: [(RealSrcSpan, EPP ())] -> EPP ()
applyListAnnotations ls = withSortKey ls
withSortKey :: [(RealSrcSpan, EPP ())] -> EPP ()
withSortKey xs = do
  Ann{annSortKey} <- asks epAnn
  let ordered = case annSortKey of
                  NoAnnSortKey    -> sortBy orderByFst xs
                  Annsortkey keys -> orderByKey xs keys
  mapM_ snd ordered
data AnnSortKey
  = NoAnnSortKey
  | AnnSortKey [RealSrcSpan]

Usage for editing

  • Annotations are self-contained, so the SrcSpan is not important in terms of printing AST fragments. So freely able to delete, move, duplicate fragments.
    • Note: uniqueness is important for ordering of binds, declarations, etc
  • Single pass, so no intermediate processing required.
    • To be confirmed. Currently having doubts

Future directions

  • Some sort of printer combinators, derived from the annotations, or as the annotations.
  • harmonisation between exact printing and ppr printing
    • Note: exact printing only feasible for ParsedSource.
  • Update ParsedSource so that AnnSortKey is unnecessary
  • Sort out the RdrName <-> Name <-> Id mapping
    • This currently happens (for API tooling) via the LocatedN RdrName SrcSpan.
  • Get rid of CPP in favour of a tooling-friendly option

Links