-
Notifications
You must be signed in to change notification settings - Fork 16
On Templates
Templates are the place where the most of Wikipedia machine-processable data is. Though, templates are tricky.
TL;DR: you can skip to Templates in Infoboxer section, though, it's not recommended!
There are many kinds of templates in Wikipedia.
Look, for example, at a page about Spirit rover (poor little Spirit!). You can see several different kinds of templates. To name the few:
Infobox: large and separate chunk of contents
{{Infobox spaceflight
| name = ''Spirit''
| image = NASA Mars Rover.jpg
...
Text with semantic information:
From sols {{age in sols|2004|01|04|2008|11|14}} to
{{age in sols|2004|01|04|2008|11|20}}, November 14, 2008 to November 20,
2008 Spirit averaged {{Convert|169|Wh}} per day.
This renders to "From sols 1728 to 1734, November 14, 2008 to November 20, 2008 Spirit averaged 169 watt-hours (610 kJ) per day.".
Content wrappers:
{{Columns-list|2|
* [[Aeolis quadrangle]]
* [[Autonomous robot]]
* [[Composition of Mars]]
It just says "list inside me should be two-columns".
Special links:
==Design and construction==
{{main|Mars Exploration Rover}}
It says "main link for this section content is on the page 'Mars Exploration Rover'".
There also can be found (though, not on this page):
-
Simple formatting or text substitutions, like
{{,}}
(means "bold middle dot" -- " · "). -
Formatting creators, like
{{Bulleted_list
(which allows to do more complex formatting than default* list item
syntax) - ...and so on.
Infoboxer's parse tree allows you to navigate by templates like any other nodes on the page, and fetch their variables. Just like this:
Infoboxer.wp.get('Argentina').
lookup(:Template, name: 'Infobox country').
fetch('leader_name1', 'leader_title1')
# => [#<Var(leader_name1): Cristina Fernández de Kirchner>, #<Var(leader_title1): President>]
See API docs for more on basic Template class functionality.
The thing is, its not enough for most tasks. Many templates (like Template:Convert) are crying to have additional methods and services to be useful for information extraction.
Another issue is text rendering: by default, most reasonable thing Infoboxer
can do with paragraph like Some text {{template|value|other=value}} another text
is just not render the template at all (render as empty string) -- as many
of templates are NOT supposed to be in "normal" content flow.
But there are others, like when you have wikitext like:
Einstein was born {{Birth date|df=yes|1879|3|14}} in Ulm.
...you would not be happy with rendering it like "Einstein was born in Ulm", neither like "Einstein was born df=yes1879314 in Ulm".
The only way to render templates properly, and to add some functionality for arbitrary templates is to provide template definitions. Yes, for each (popular) template. And no, they almost never can be created automatically from Wikipedia template definitions.
This requires a lot of work. The most basic of it for English Wikipedia was already done (you can look at it in repo).
A help will be gladly appreciated to enchance those definitions and add another sets -- for other WikiMedia projects, for other languages Wikipedias and so on.
All of this just looks like a rough sketches for now and is subject of further improvements.
Next: Tips and tricks