-
Notifications
You must be signed in to change notification settings - Fork 43
How it works
When you query some entity by name, Reality does the following:
- queries the page from Wikipedia API;
- parses this page with infoboxer into navigable DOM-alike structure;
- also queries Wikidata item, corresponding to current page, as a set of predicates.
For example, my home city Kharkiv is represented in Wikipedia this way and in Wikidata that way.
Next, there's a dictionary of Wikidata predicates (properties) and their
mapping into methods. So, any entity having Wikidata predicate P625
("coordinate location") will map it into #coord
method, providing instance
of Reality::Geo::Coord.
Other properties can be parsed into entities, named measures, just strings,
list of those objects and so on.
Then, there are many useful data about objects which (still?) doesn't exist in Wikidata's structured form. We are taking them from parsed Wikipedia page. For example:
kharkiv = E('Kharkiv')
kharkiv.country # from Wikidata predicate
# => #<Reality::Entity?(Ukraine)>
kharkiv.area # not in Wikidata, from Wikipedia page infobox
# => #<Reality::Measure(350 km²)>
E('Bjork').albums # from list Wikipedia page's "Discography" section
# => #<Reality::List[Björk (album)?, Debut (Björk album)?, Post (Björk album)?, Homogenic?, Vespertine?, Medúlla?, Volta (album)?, Biophilia (album)?, Vulnicura?]>
Unfortunately, Wikipedia infoboxes are not standartized and we never can
write "this field in infobox should always be that method in entity". For
example, country infoboxes typically have field "area_km2" for country
area, and city infoboxes typically name this "area_total_km2", for continent
it is "area", and written in different manner (using {{Convert
template).
So, for Wikipedia parsing, Reality defines a DSL like "from this type of infobox extract that type of data", "if there's a section named so-and-so, it goes to such method" and so on.
Links to real code:
- Intro
- Applications
- Links and mentions
- Tutorial:
- Tips & tricks
- Advanced topics
- Molybdenum?..