all time
Headline | Time | |
---|---|---|
Total time | 2d 11:27 | |
GSoC | 2d 11:27 | |
\__ Time tracking & TODOs | 1d 22:26 | |
\__ Notes | 13:01 |
today
Headline | Time | |
---|---|---|
Total time | 12:11 | |
GSoC | 12:11 | |
\__ Time tracking & TODOs | 12:11 |
Create an interface for using oxcart either standalone or via leiningen to compile Clojure source to bytecode
is woeful at best, go back and add appropriate coverage
- Is one feasable?
- Depends on Called-by-what analysis pass for fn to method reduction correctness
- Depends on Multi-arity to single arity reduction for fn to method reduction correctness
- Depends on Constant inliner pass to get the most out of inlining, as this will enable more functions to be demoted to methods from IFns.
- An issue is that namespaces are not really “modules” due to the
implementation of the
(ns)
macro which simply alters the state of the var table mapping which constitutes a “namespace” through the emission of require and use forms which create var table side-effects. Since the idea with Oxcart is that all this var table garbage is garbage ideally I want to be able to escape the var table altogether for the overwhelming majority of vars to the point that it may make sense for each load() to do its own explicit var table manipulation. The information required for doing this could probably be derived from(ns-map)
but we’ll see how that goes.
- Is one feasable?
[#A] Improve the whole program AST to a mutable object tree via transients for improved update semantics and performance
This is probably going to be a fun exercise in building a Cortext style CES out of TANAL trees, identifying duplicated forms (vars, locals) reducing them, returning the result for rewriting and then providing an interface to render it back to a TANAL tree.
see https://www.refheap.com/86571 for the original cut of the code in question, which could bear some major tightening up.
tbaldridge said Rich has previously implemented a lot of this stuff, and that it has been reverted out for dubious reasons. Tracking down what this was and when it happened would probably be a good start to determining whether or not I can reasonably make the requisite changes as part of GSoC.662b38415e15edcbd720628c0c07a8f8817c96b4 This commit added ^:static annotations all over the core and cleaned up the core compiler support for it.
ca737838aa65970775c58cd3a72fea4221a67bda This commit added ^:static compiler support in the first place. Looks like all this did/does is create invokeStatic methods on static tagged classes, allowing the compiler to escape vars entirely for some subset of a program.
http://stuartsierra.com/2013/03/29/perils-of-dynamic-scope “In Clojure versions 1.2 and earlier, all Vars had dynamic scope by default, but this meant that there was a performance cost to look up the current dynamic binding of a Var on every function call. Leading up to 1.3, Rich Hickey experimented with allowing Vars to be declared ^:static, before settling on static by default with ^:dynamic as an option. You can still find ^:static declarations littered through the Clojure source code. Maybe someday they’ll be useful again.”
http://dev.clojure.org/display/design/%27Lean%27+Runtime http://dev.clojure.org/display/design/Build+Profiles
- Compare to the type perservation system in 48bfe3bba0035f555a045ae4d2a5933ff57eea0a
<Bronsa> arrdem: btw it looks like an approach like invokeStatic (assumed I understood this correctly) should be more robust for removing the var indirection rather than using ^:const because it should work fine with fn with closed-overs too <hiredman> Bronsa: well, :static generated static methods, so closed overs wouldn’t work <Bronsa> hiredman: oh right <Bronsa> arrdem: I derped. ^ <hiredman> Bronsa: you could just get rid of manual :static flagging, do λ lifting and implement any function that doesn’t close over anything and is never used as a value as a static method <hiredman> (you know, for the clojure compiler you are working on)
It would be very useful to be able to compute the reach set of a given expression in terms of Java classes accessed. This would allow me to simply ban clojure.lang.Var.alter and soforth as Java methods rather than having to do the unsupported symbol warning at the Clojure level. I’m not sure if t.a.jvm supports this but it’s worth asking. This is possible for static typing of classes but is not generally possible. Thanks to type hints indicating var type it may be possible to infer the banning of most of the clojure core var manipulation code.- Rewrite ((partial f a b ..) g h) → (f a b .. g h)
- Rewrite (partial f) → f
- General invoke inlining potentially.
<Bronsa> arrdem: btw, RE: automatically adding :inline [19:39] <arrdem> yar <Bronsa> do you realize that :inline points to a function, not a boolean? <arrdem> yeah which is unfortunate. <Bronsa> I mean, I guess you could just walk the macroexpanded fn body wrapping all the args with clojure.core/unquote [19:40] <Bronsa> but I'm not too sure how easy that would be to do <arrdem> well so the "real" fix is that you take the def'd fn, and then partially evaluate it in the invoke context using the provided arguments. <arrdem> so (f x) becomes ((fn [x] (+ 1 x)) x) which you can then do term rewriting on [19:41] <Bronsa> right <arrdem> which is entirely general so long as you know what code is pure and can be rewritten... <arrdem> but we don't have purity tags either... <arrdem> but that's really just a bound on the partial evaluator not your ability to replace {:invoke} with {:do} [19:42] <arrdem> just create a {:op :let} with the appropriate bindings and let it go [19:43] <*arrdem> adds this to the todo list
See the GHC paper on Haskell inlining for some awesome background
- Can apply ever be rewritten? (no)
- Are there non-function values that make sense to try and inline, reduce or fold?
This is probably pretty low priority, but I think it’s something that I’ll get for free since Oxcart doesn’t leverage anything (yet) beyond the static subset of Clojure which I’m trying to compile.
Hierarchies are something else that could likely be precomputed and stored, but maybe not since it’s entirely meaningful to build a hierarchy at runtime. Whatever, they’re a weird and uncommon language feature anyway.
No really. Basic TCO isn’t that hard to do. More interesting multiple function to state machine transforms could also be valuable, but only if supported by bytecode generation. Multiple fns get concatinated with appropriate state transition code rewrites, but the state transition is a jump and the stack configuration should be the same in a jump to a body as a call, so this may be something I can get for free.
method a(args) = apply _gen a-id-const args method b(args) = apply _gen b-id-const args method c(args) = apply _gen c-id-const args method _gen(id-const . args) = case id-const: a-id-const: .label a < a implementation > b-id-const: .label b < b implementation > c-id-const: .label c < c implementation >
Note that since all args are on the stack “jumping” from a to b or whatever is entirely correct since the only consumed argument stack value is the entry point ID. This means the argument stack must be in function call order, which means generating naive bytecode is correct. Similarly replacing tail calls in the block with bytecode gotos is then correct because the stack must be in call order otherwise the call would be invalid.
Creating partially evaluated specializations of map and other clojure core functions that are specialized on a function
- The spec has changed
- The performance has changed
- It may or may not be worthwhile
Statically identify multimethods as defs and cache final their fully computed dispatch tables as static values discarding all other manipulating operations.
This will also require hoisting defmethod forms to defn forms and eliminating the generated method instalation code.
Note that internally multimethods use a table of dispatch values to IFns, so doing this will require computing an inverse mapping from IFns to Vars, and then writing a new root function which holds the dispatch table as a static value.
Given that I’m already proposing to precompute multimethods dispatch tables it’d be cool to try and look through the resulting table and try to precompute the dispatch target of a given multimethod use.
- Indicate preserved vars
- Preserve all vars
- Default of optimize everything
- Default of optimize nothing
Add `clojure.core.typed` annotations to all Oxcart code
Apply pointer analysis to structural sharing and attempt compiler introduction of transients
Interface with `clojure.core.typed` to provide compiler introduction of `core.typed` derived records and runtime typechecking
When type hint permits kick out a specialized invoke and dispatch to it.
https://github.com/halgari/data-all-the-asts/blob/master/src/data_all_the_asts_talk/core.clj#L181 https://www.youtube.com/watch?v=KhRQmT22SSg
So there are several parts to a AST-as-data
:op ;; allows for polymorphic dispatch on the node type :form ;; original un-analized form :children ;; child nodes in “execution order” :env ;; a grab-bag of data about the context of the node
I need to be able to say that this var maps to that ast with that reach set. Then I can say “collect all reach sets” into a single sum reach set and then do var emission on that basis.
{#’clojure.core/conj -> (ast conj) … }
To compute this whole program AST we’re gonna have to do some weirdness with clojure.core/load and clojure.core/eval to interact with tools.analyzer. tools.analyzer.jvm isn’t far behind in all this but I don’t think I need it yet.
The reality of the matter is that “lein uberjar” is how 99.9% of Clojure applications get packaged and consequently whatever I wind up with for an optimizing compier that faces the user is gonna have to have a lein plugin of some sort which will run my/Nicola’s compiler over the input fileset as determined by the same logic that lein uses to determine what gets uberjared.
- clojure.lang.Var.alterRoot()
- clojure.lang.Var.alter()
- clojure.lang.Var.set()
- clojure.core/alter-var-root
- clojure.core/set! (compiler special form)
The form (clojure.lang.Var/set <X>) analyzes down to
(defn clear-env [ast]
(clojure.tools.analyzer.ast/prewalk
ast #(dissoc %1 :env)))
(use 'clojure.pprint)
(->> (clojure.tools.analyzer.jvm.core-test/ast1
(fn [x]
(clojure.lang.Var/set x (Long. 1))))
clear-env
pprint)
;; {:top-level true,
;; :children [:methods],
;; :op :fn,
;; :form (fn* ([x] (clojure.lang.Var/set x (Long. 1)))),
;; :variadic? false,
;; :max-fixed-arity 1,
;; :methods
;; [{:children [:params :body],
;; :loop-id loop_7404,
;; :params
;; [{:form x,
;; :name x,
;; :variadic? false,
;; :op :binding,
;; :arg-id 0,
;; :local :arg}],
;; :fixed-arity 1,
;; :op :fn-method,
;; :variadic? false,
;; :form ([x] (clojure.lang.Var/set x (Long. 1))),
;; :body
;; {:body? true,
;; :op :do,
;; :form (do (clojure.lang.Var/set x (Long. 1))),
;; :statements [],
;; :ret
;; {:children [:target :args],
;; :args
;; [{:children [],
;; :assignable? false,
;; :form x,
;; :name x,
;; :variadic? false,
;; :op :local,
;; :arg-id 0,
;; :local :arg}
;; {:op :new,
;; :form (new Long 1),
;; :class Long,
;; :args
;; [{:op :const, :type :number, :literal? true, :val 1, :form 1}],
;; :children [:args]}],
;; :method set,
;; :op :host-call,
;; :form (. clojure.lang.Var (set x (Long. 1))),
;; :target
;; {:op :const,
;; :type :class,
;; :literal? true,
;; :val clojure.lang.Var,
;; :form clojure.lang.Var}},
;; :children [:statements :ret]}}],
;; :once false}
So to find this host interop expression we’re looking for this structure in the analysis…
{:op :host-call :method set ;; or any of #{set alter alterRoot} :children [:target :args] :args <anything> ;; no really we don't care :target {:op :const :type :class :val clojure.lang.Var}}
Alternatively if we analyze the source of clojure.core/alter-var-root we see…
(defn alter-var-root
"Atomically alters the root binding of var v by applying f to its
current value plus any args"
{:added "1.0"
:static true}
[^clojure.lang.Var v f & args] (.alterRoot v f args))
which analyzes out to this key form… (. v (alterRoot f args)) or the t.a.jvm tree
clojure.tools.emitter.jvm> (->> (clojure.tools.analyzer.jvm.core-test/ast1
(defn alter-var-root
[^clojure.lang.Var v f & args]
(.alterRoot v f args)))
:init :methods first :body :ret :instance :tag)
clojure.lang.Var
Okay, so we have two cases here for ways that you can access this blacklisted value, now we need a way to figure out what blacklisted functions are.
Compiler.Expr.compile seems to do conditional classfile generation, however most of the classfile generation is done by Compiler.compile1 and Compiler.compile. The control flow path seems to be that Compiler.compile invokes Compiler.compile1 which invokes Expr.compile. Expr.compile then emits per-expression classfiles (nested fn types etc.), Compiler.compile1 then emits a top-level class for each form in the ns, and Compiler.compile emits the loader class which links all of the above together.The trick with Compiler.java is that Expr.compile is mutually recursive with Compiler.compile1. What this means is that per-emitted expression classfile generation is done in the Expr.compile/Compiler.compile1 cycle.
The structure is that Compiler.compile
seems to be that
Compiler.compile is the intended entry point, and it will generate
the *__init.class file. The load()
method of __init.class is
populated by invoking Compiler.compile1 on each top level form of
the namespace being built. Compiler.compile1
for side effects
invokes the .emit()
operation on every read sub-expression,
which writes to a shared mutable classloader passed through from
Compiler.compile
. Compiler.compile
invokes Compiler.compile1
after beginning the visit of init()
. This means that all
bytecode written by .emit()
members is appended implicitly to
the bytecode of *__init.class/load
.
For tejvm/-emit-loader it looks like I need to chase in-place writes to the generator adapter parameter of Expr.emit(), with the possibility that Expr.emit() somehow delays computation until it’s evaluated by Expr.eval().
Expr.emit is the key to all of this, since it handles the creation of the loading & instantiation code. The expressions themselves (via ObjExpr.emitLocal and ObjExpr.emitUnboxedLocal)
(defmulti -emit-loader :op)
(defmethod -emit-loader :AssignExpr
;; defers to either
;; - :InstanceFieldExpr
;; - :LocalBindingExpr
;; - :StaticFieldExpr
;; - :VarExpr
)
(defmethod -emit-loader :HostExpr)
(defmethod -emit-loader :FieldExpr)
(defmethod -emit-loader :InstanceFieldExpr)
(defmethod -emit-loader :StaticFieldExpr)
(defmethod -emit-loader :MethodExpr)
(defmethod -emit-loader :IfExpr)
(defmethod -emit-loader :ImportExpr)
(defmethod -emit-loader :InstanceOfExpr)
(defmethod -emit-loader :InvokeExpr)
(defmethod -emit-loader :KeywordInvokeExpr)
(defmethod -emit-loader :LetExpr)
(defmethod -emit-loader :LetFnExpr)
(defmethod -emit-loader :ListExpr)
(defmethod -emit-loader :LiteralExpr)
(defmethod -emit-loader :BooleanExpr)
(defmethod -emit-loader :ConstantExpr)
(defmethod -emit-loader :KeywordExpr)
(defmethod -emit-loader :NilExpr)
(defmethod -emit-loader :NumberExpr)
(defmethod -emit-loader :StringExpr)
(defmethod -emit-loader :LocalBindingExpr)
(defmethod -emit-loader :MapExpr)
(defmethod -emit-loader :MetaExpr)
(defmethod -emit-loader :MethodParamExpr)
(defmethod -emit-loader :NewExpr)
(defmethod -emit-loader :ObjExpr)
(defmethod -emit-loader :RecurExpr)
(defmethod -emit-loader :SetExpr)
(defmethod -emit-loader :StaticInvokeExpr)
(defmethod -emit-loader :TheVarExpr)
(defmethod -emit-loader :UntypedExpr)
(defmethod -emit-loader :MonitorEnterExpr)
(defmethod -emit-loader :MonitorExitExpr)
(defmethod -emit-loader :ThrowExpr)
(defmethod -emit-loader :VarExpr)
(defmethod -emit-loader :VectorExpr)