-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract Response Processing #7
base: master
Are you sure you want to change the base?
Conversation
@zverok this is mainly a placeholder PR to get some feedback on the direction of things. This is primarily meant to keep things backward-compatible (except for classes where the namespace was changed, but should not affect users of the DSL). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of "just refactoring" changes look OK to me (left some comments), but the situation with Processors
module is a bit unclear.
Depending on your final goal, I can imagine several approaches here. As far as I can understand, you want to achieve is: Use TLAW construction DSL, but:
- not use its response processing at all (receive raw responses);
- or, use its basic response processing (JSON parsing and errors guard), but not "advanced" features (flat hashes, data tables,
post_process
DSL); - or, use parsing & errors guards,
post_process
DSL, but avoid flattening of hashes and data tables.
In either case, I believe that the "right" way is splitting response-processing into Faraday-like independent middleware, or even indeed Faraday middleware: in this case, "parse JSON" and "fail on errors" are probably already existing ones! Then, what will be left is
- Some "flattener" (which just takes parsed response and makes it flat & adds datatables → unless the latter is ONE MORE independent middleware);
- Some "handlers" middleware, allowing to set "entire response" and "per-key" handlers, the trick here that they probably can't rely on flattening, so should be able to set an arbitrary depth of processing; also, this is kind of middleware that updates the DSL too, or, alternatively, the DSL that adds middleware to response processing.
Please tell me what you think about the above reasoning. I probably can jump in and write some code too, to clear the path :)
lib/backports.rb
Outdated
yield self, *args | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just add backports
to the dependencies? It is kinda more generic.
lib/tlaw/namespace.rb
Outdated
ArgumentError, | ||
"Unregistered #{expected_class.name.downcase}: #{symbol}" | ||
) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred the previous logic (fetch → validate → create), it seems more "functional" to me.
And, in fact, could've been simplified a bit:
children[symbol]
.tap { |child_class| ... }
.new(@parent_params.merge(params))
README.md
Outdated
@@ -5,7 +5,7 @@ | |||
[![Coverage Status](https://coveralls.io/repos/molybdenum-99/tlaw/badge.svg?branch=master)](https://coveralls.io/r/molybdenum-99/tlaw?branch=master) | |||
|
|||
**TLAW** (pronounce it like "tea+love"... or whatever) is the last (and | |||
only) API wrapper framework for _get-only APIes_<sup>[*](#get-only-api)</sup> | |||
only) API wrapper framework for you'll ever need APIs in a consistent way |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit of a bad copy/paste job there, but also have the goal of allowing other request methods. Will add back the GET-only text for now since other methods are probably a ways away (and maybe not needed in the short term anyways).
lib/tlaw/dsl/base_wrapper.rb
Outdated
|
||
module TLAW | ||
module DSL | ||
# @!method base(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs of all # @!method
-s below are related to DSL itself, not BaseWrapper
, they should be returned to dsl.rb
lib/tlaw/namespace.rb
Outdated
@@ -35,7 +37,7 @@ def base_url=(url) | |||
@base_url = url | |||
|
|||
children.values.each do |c| | |||
c.base_url = base_url + c.path if c.path && !c.base_url | |||
c.base_url ||= base_url + c.path if c.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somehow liked the previous version better ("set, if it has a path and no base url" is one "statement" in my mind, while ||=
and if
in same line makes it hard to follow: "unless it is set, set it... Oh, if it has path, BTW")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zverok I debated this as well, and actually intended to remove the trailing if ...
if you know what the expected logic or resulting values are intended to be here. Could this just use the base_url
even if the path is nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably check with specs.
I believe these excessive checks are the result of how DSL constructs it: not by one constructor, but gradually setting parts of an object, and in the process sometimes it becomes hard to a) not reset what is already properly set and b) not set the value too early, to incomplete state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this was reverted
lib/tlaw/params.rb
Outdated
@@ -0,0 +1,37 @@ | |||
module TLAW | |||
module Params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have some docs for the module, probably.
lib/tlaw/params.rb
Outdated
def self.make(name, **options) | ||
# NB: Sic. :keyword is nil (not provided) should still | ||
# make a keyword argument. | ||
if options[:keyword] != false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. I know I wrote it myself, but lets flip the if
for the sake of readablility (if == false →Argument else → Keyword
)
lib/tlaw/params/param.rb
Outdated
# converting. You'll never instantiate it directly, just see {DSL#param} | ||
# for parameters definition. | ||
# | ||
class Param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it Params::Base
? I've never been a huge fan of Foos::Foo
naming.
module Processors | ||
# @private | ||
# FIXME: everything is awfully dirty here | ||
class BaseProcessor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Processors::Base
?
end | ||
|
||
def all_processors | ||
[*(parent && parent.all_processors), *@processors] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, in 2018 we probably can limit the lowest version to 2.3 and just go with &.
@zverok your comments have been addressed except for 2 I left comments for. The Regarding your reasoning, I think it all makes sense. In terms of my goals, I would want to keep the current TLAW behavior and DSL in tact. However, the underlying architecture would become more modular and focused on building pipelines (or middleware) to handle processing of HTTP responses. Ideally, the DSL can have the The additions I'm envisioning are not far from what is already there. However, I would want to achieve:
I'm developing an "abstract" ETL pipeline project that can encapsulate the workflows and use existing gems under-the-hood to execute parts of the process (e.g. TLAW can be used to create an API client to stream data into an ETL pipeline) So all that to say that the current TLAW behavior would ideally stay intact or if it makes sense, some default behavior may evolve. |
examples/demo_base.rb
Outdated
@@ -1,6 +1,6 @@ | |||
require 'pp' | |||
|
|||
$:.unshift 'lib' | |||
$:.unshift '../lib' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the proper File.expand_path('../lib', __FILE__)
, then :)
Because I run them from root of repo (ruby examples/<something>.rb
), and this way they will not work.
lib/tlaw/namespace.rb
Outdated
@@ -35,7 +37,7 @@ def base_url=(url) | |||
@base_url = url | |||
|
|||
children.values.each do |c| | |||
c.base_url = base_url + c.path if c.path && !c.base_url | |||
c.base_url ||= base_url + c.path if c.path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably check with specs.
I believe these excessive checks are the result of how DSL constructs it: not by one constructor, but gradually setting parts of an object, and in the process sometimes it becomes hard to a) not reset what is already properly set and b) not set the value too early, to incomplete state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey.
I wanted to ask you a big favor. It is really interesting to me where the story with response processors unwinds, but it becomes to be really hard following 50+ files PR.
If it is possible, can you split this PR in two: one with refactorings/spelling/examples fixes without any API changes, we'll discuss it briefly and merge; and the next one focused just on processors/transformers redesign? I believe it would be waaay easier for both of us to proceed this way.
Yes, that's one of my goals (that I've never had time to fully embrace for this project). So, let's go with it 👍 BTW, as you've mentioned it, I, in fact, developed Hm somewhat with TLAW in mind, so probably (on some further steps) the possibility could be investigated to do things in "Hm-like" style: transform('items', :*, 'date', &Date.method(:parse))
# instead of
transform_items('items') do
transform 'date', &Date.method(:parse))
end (The former now looks "cleaner" for me, though it contradicts open_weather_map example, with But that's for future (also, PS: On a totally unrelated note, yesterday evening younger colleague asked for help with some Magento integration, and we struggled for a few hours with rough SOAP/Savon, and then I googled for ready solution, and suddenly found |
@zverok Ha! What a funny coincidence that you guys found that Magento project. Glad it was of help! Regarding multiple PRs: I tried to largely group updates into commits, so I would initially recommend looking through the commits as a means to understand the changes. If that isn't possible, I'll see about multiple PRs, if that's OK? Regarding transformations: Yes, Hm was the project, and as a matter of fact, it looks like you just started that very recently. Solnic's Transproc is the other one, and both of these have lots of common transformations built-in. My goal is to develop a set of gems that can help with solving various use cases, that can all be used together. Primarily, the goal is to provide DSLs to construct ETL workflows or data migrations, report generation, and other things in an abstract way that easily lets you use the tools you want to use for things like transformations. (For instance, take the TLAW DSL for defining the workflow and transformations. The transformations could be performed by Hm, Transproc, or both.) A bit ambitious, I realize, but that's the goal. We can talk more about this if you're interested... for now I'm using my reporting gem as a workspace for these concepts, and will abstract out various aspects of that project into individual gems for composition for individual use cases (eventually). Thanks for reviewing things! |
@zverok I should note that the initial abstraction in reporting builds on top of Kiba, but Kiba is not a direct dependency. The Pipeline DSL allows for constructing the transformations, and then the job can be run by Kiba (or another framework). Some examples in the project to illustrate the ideas:
I like how you developed a DSL object model that wraps around the domain object model. I will explore that approach a bit more in order to not mix helper methods with the data model. (As you will see in some of the changes I made to the transforms in TLAW, I moved what I would consider helpers to the DSL namespace in this manner.) Thanks! |
Hey, sorry for being silent for a week, life and work took too much of my attention. I generally like the notion of "pipelines", and lately, too, become to think that TLAW should fit in some pipelines (therefore, probably requiring to inherit from it is not the best convention possible). My main use case (which led me to "inventing" the library) is the reality project (it is work-in-progress, here is the latest presentation from RubyConf India). It basically is made of "describers" (high-level homogenous wrappers) for external data services. The describer also can be seen as a pipeline (e.g. fetch Wikipedia page → parse it to wiki-AST → fetch some semantic information → wrap it into high-level value objects like "Temperature" or "Geographical coordinate") TLAW usage there can be seen, for example, here. I have mixed feelings about how it fit in: on the one hand, it was really easy to describe the data source I wanted, on the other, I needed an "internal" class to descend from TLAW, while probably ability to somehow "include" the DSL into the pipeline could be better. |
I was out of town and got a chance to look at your links now. I saw
As I think about data sources, transformations and peristence (more related to ETL), I also think a little bit about making it more event-oriented and potentially some parts async. However, I think that is higher-level pipelines than we are currently talking about. I will play more with TLAW concepts and how some of my goals in the previous comment fit together as I get my head around this again. It sounds like you are open to determining TLAW's place in a system of pipelines. Let me know if you have any specific thoughts about the relationship between TLAW, pipelines and how to integrate multiple libraries together (e.g. |
aea3fc2
to
287114c
Compare
@zverok I've been looking more into ROM.rb and it looks like the HTTP adapter does what we are doing with request and response handling. There is a ROM HTTP adapter and I also found an example of a Faraday ROM HTTP adapter example. ROM.rb is also very functional and pipeline-based. It seems like it could be a good foundation for ETL as I've been thinking about, and TLAW would be great for customizing the HTTP client/adapter. I'm going to give more thought about if/how ROM fits into things, but I wanted to see what your thoughts are. |
@zverok I've recreated commits for the remaining changes on this PR. It's two commits that encompass renaming the post-processor methods and extracts the response processor to a configurable class. |
I have no clear idea, currently, mostly because I haven't experimented enough. The generic concerns are:
class SpecializedWeather
def very_concrete_method
search(city).weather(date) # that's private internal TLAW
.yield_self { some very fancy wrapper }
end
private
# all those methods become availabel to instance yet private.
tlaw do
........
end
end I am not sure it will be good, just an idea. As about ROM, I believe that they approach the same problem here, but from a bit different angle. Their is mostly "apply repository pattern to everything" (including HTTP, implementing something like Repository-based ActiveResource); mine is mostly utilitarian, some small tool to be chainable into bigger ones. Maybe the approaches can be joined somehow, IDK for sure :) |
Thanks for your thoughts @zverok - I think we are pretty aligned in our thinking. I am getting more into the ROM/Dry/Hanami world and will report back about what I see the fit as. The one thing that I have been thinking about a lot lately is the value of DSLs as a high-level interface to describe what you want your application to do, and making the implementation details configurable. I really like the Container/IoC patterns to make this even more flexible. Similar to how you describe specifying a method name for some custom transformation, that could very well refer to a key in a container -- because I definitely agree with you that post processing via nested blocks can become very unwieldy in non-trivial situations. To break it down, parts of it should probably be their own methods or classes that you compose together. Hence, I really am interested in looking at what can be done by developing a DSL to handle these complexities, which will then build the composition of the pipeline with your various implementation details behind the scenes. I'll report back with what I find as I get more into those libraries and communities I mentioned. In the meantime, what are your thoughts about merging this PR? (I had originally aliased methods to make the transformation methods backward-compatible. If you have concerns about the API changes, that is an option. My guess is that this could be a drop-in for you if you add back the aliases.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, now I can see that (save the minor comments) I definitely like it!
Let's just merge after we discuss/fix comments (the only important one is probably items
vs _each
thing).
I believe that this library (unfortunately) have no active users except me, and it has 0.0.x version number, so it seems some good renamings are totally acceptable.
|
||
# input is entire response, and response is fully replaced with block's | ||
# return value | ||
post_process { |hash| hash['foo'] } # Now only "foo"s value will be response | ||
transform { |hash| hash['foo'] } # Now only "foo"s value will be response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transform(replace: true)
should be here, right?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zverok i mentioned in another comment of yours that I'd like to remove the replace
option and just always return the value that should be used rather than mutate the argument to the block. Thoughts?
# `{"key.count" => 1, "key.continue" => false}`. | ||
# | ||
# @overload post_process(&block) | ||
# @overload transform(replace: false, &block) | ||
# Sets post-processor for whole response. Note, that in this case | ||
# _return_ value of block is ignored, it is expected that your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "Note,..." is not true anymore, considering replace:
option :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the cherry-picks, I'm confused about what ended up wrong with the moving things around to separate PRs. I'll have to look this over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, looking at this, it is correct. The new example illustrates the same effect for when replace: false
is passed (which is also the default). Further down you'll see an example of what happens with replace: true
, which is new documentation.
I think the note indicates that transform(replace: false)
is a misnomer and also a bit confusing. My thought is that if you don't want to transform the value, this should be a method name like handle
or something (e.g. maybe process
for this specific behavior) where the return value is not expected to be processed. (I noted in another comment that I would prefer to make the default behavior of transform
to be replace: true
.)
# ``` | ||
# | ||
# @param key [String] | ||
|
||
# @!method post_process_items(key, &block) | ||
# @!method transform_items(key, &block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it transform_each
, WDYT? I don't know where was my head when I called that _items
:( I was younger, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha. My opinion is that the transformations should return the new value, not mutate it. So, I would probably want it to be map
maybe. I meant to note this if I didn't already. This goes for all transformations, not just this one.
def self.batch(key, &block) | ||
batcher = new(key) | ||
batcher.instance_eval(&block) | ||
batcher.processors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer (though, not insist)
new(key).tap { |batcher| batcher.instance_eval(&block) }.processors
Well, let's try this. I mean, if all the examples and docs would show it is clear enough -- I don't mind getting more functional here :) I reviewed the examples where "mutating" # before
post_process { |e|
e['coord'] = Geo::Coord.new(e['coord.lat'], e['coord.lon']) if e['coord.lat'] && e['coord.lon']
}
post_process('coord.lat') { nil }
post_process('coord.lon') { nil }
# after
post_process { |response|
next response unless response.key?('coord.lon') && response.key?('coord.lat')
response.merge(
'coord' => Geo::Coord.new(response.delete('coord.lat'), response.delete('coord.lon'))
)
} Then, probably the most readable API would be: transform { |whole_response| should return next part }
transform(key) { |key value| should return new key value }
transform_array(key) do
transform { ... }
end About the last name: I'm still feeling a bit unsure about it, but neither I also thought about transform('list', :*, 'sys.sunrise', &Time.method(:at))
transform('list', :*, 'weather.icon') { |i| "http://openweathermap.org/img/w/#{i}.png" }
... ...but on this "model" example (OpenWeatherMap) it doesn't bring any clarity at all (and "join coord.lon & coord.lat into coord" seems even more cumbersome). Probably this part is subject to further investigations. |
Here are some initial thoughts I had ruminating, but have not yet thought through all use cases. If we consider a hierarchy of how the response is processed, maybe we can do something that I saw in a ROM video demo (but can't seem to find an example in their docs). There was an example of two ways to define mappings. First was to use a block with the entity as an argument. The second was a block without an argument where you could call transformation methods that implicitly applied to the entity. The first was for more custom mapping (e.g. mutations or whatever you want) and returning a result. The second was to describe the transformations via the use of the transformation DSL. This is where I see either taking from ROM or implementing some aspect of ROM. Anyways, maybe the DSL can be something like this, going with the implicit version we've been using in TLAW already: # Hook into transformation process, responsible for returning the full result
transform do
# This would map over whole response, assuming the whole response
# is an array or just processes the whole response as a single item
map do # item in the array is implicit
rename old_key: :new_key
# change value
map(key) { |value| ... new value ... }
# could have an option to convert to array if it's a single item
map(key, ensure_array: true) { |value| ... new value ... }
# Many more transformation helpers
end
end
# Other custom approach
transform do |response, transformer|
# do stuff to the response, maybe create a new object to copy resulting data to
result = response # do stuff
# use transformer to take advantage of transform DSL
transformer.for result do
# this is the same context as `transform` without arguments
end
end I realize this example isn't clear about what gets included or excluded in the resulting output (e.g. if keys you don't transform get added or get skipped in output, etc). Thoughts on this approach, which describes the data hierarchy more? |
@zverok I added a second example with more customizability to the previous comment |
Hey. Sorry for the late answer :( Mindful responding requires a lot of my attention that I am currently short of. The thing is, I believe, that good new or updated transformations DSL hardly can be "guessed". I committed some examples (I've had them for long in gitignored folder) to What I currently can say, probably: in fact, "rich DSL for hash transformations" is probably out of the scope of TLAW: it is the road that can be walked very far, and in fact, nothing implies that "hash transformations" should be part of "HTTP API wrapper" library. I started with "opinionated" library which transformed everything, but we already discussed that it was probably a false move (unlike endpoints definition API which is flexible and not assumes anything about client code). So, what I am thinking now in light of this idea, is that probably this should be only one and ultimate interface: endpoint ... do
transform &anything
end The only thing that does part of TLAW API defines is that you can specify some transformations -- just with any block-alike object. Now, this object can be produced by For my own code, I am playing in my head with idea of extending TRANSFORM =
Hm()
.transform_values('weather', &:first)
.transform_values('dt', %w[sys sunrise], %w[sys sunset], &Time.method(:at))
.except('dt_txt') # ... and so on; returns #to_proc-able object
# for singular value:
transform &TRANSFORM
# for values in a list: transform each of list items
transform &Hm().tranform_values(['list', :*], &TRANSFORM)
# I can imagine different ways of data control in this approach, like...
Hm()
.expect('weather', Array) # fails if there is no 'weather' key, or pattern === value does not match) This way next version of TLAW will loose those small nice opinionated "flattening" features, but I am almost sure that nobody have ever used or liked them, except my demo files :) WDYT?.. |
@zverok now we're talkin'! :-) Yes, this is what I had in mind -- a generic way to "hook" into aspects of a pipelilne. Specifically with TLAW, the My thought is that you could specify a thing that responds to Alternatively, you can "register" transformations into some sort of registry and refer to them by name ( There could be a way to register a transformer that can be This is very much what some aspects of ROM do, and ROM or DRY could be some components we use behind the scenes. |
Hey @zverok we wanna revisit getting this merged? |
Hey @joelvh, sorry... I feel bad about this, honestly. At some point I somehow got distracted from this discussion.
? |
The goal is to extract all processing of the response in order to implement ETL-like post processing and streaming of HTTP requests.
Changes:
Response
objectTLAW::Processors::Base
with basic functionalityTLAW::Processors::ResponseProcessor
with response body parsing into JSON or XML withoutDataTable
flatteningTLAW::Processors::DataTableResponseProcessor
) which builds onResponseProcessor
and flattens data intoDataTable
response_processor
DSL::Transforms
namespaceprocess*
methods totransform*
with method aliases for backwards-compatibilityprocess_replace
withtransform(replace: true)
TLAW::Params
namespaceObject#yield_self
for older versions of Ruby (in place ofObject#derp
)