Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High throughput I/O (Multiple Ethernet Connections) #177

Closed
wants to merge 8 commits into from

Conversation

mossblaser
Copy link
Member

When finished, this PR fixes #142. This is a branch off the fault-tollerant-rig-ps branch and is intended for merging after #176.

  • Facilities will be added to discover (and use) multiple Ethernet connections
  • A set of read_into() methods will need to be added to allow creation of the
    read buffer in advance.
  • The begin_burst function will simply set a flag.
  • For each connection there will be a queue of read/write requests.
  • All read/write requests should be wrapped in a lambda(?) and placed in the
    buffer. For conventional reads, a buffer should be created and then
    read_into called with that buffer.
  • When not in a burst, the buffer will be emptied immediately (or a fast path
    used...).
  • The end_burst function will spawn some threads and in each thread work on
    processing the queues.

* All SCPError derrived exceptions now include a summary of the offending
  packet (most importantly x, y, p and command!). The SCPPacket object is also
  made accessible from the exc.packet attribute.
* RC errors also now contain a human readable explanation of what the error
  codes mean.
* Instead of having many RC error exception types, there is now one
  FatalReturnCodeError exception type. If differentiating between the types is
  important, the RC is included in the exc.return_code attribute. If required
  in the future, subclasses can (of course) be created for individual return
  codes without breaking backward compatibility.

This change (strictly speaking) breaks backward compatibility as it renames
exceptions of the RC-specific types. @mundya are you using these exceptions
anywhere and is breaking compatibility here a problem?
Instead of raising an exception, if get_machine() encounters an SCP error while
probing the cores/links of a chip it now simply reports that chip as dead.

This scenario most commonly occurs when a chip dies (or becomes inaccessible)
some time after the P2P routing tables have been intialised. This change means
that get_machine() now returns a valid subset of the machine which is still
accessible and is especially useful for post-morten diagnostics, e.g. using
rig-ps.

Finally, get_machine() now also has an x and y argument allowing the initial
P2P table reading commands to be sent to non-(0, 0) chips. Again, this is
potentially useful if (0, 0) has become isolated from many other chips and an
alternative ethernet connected chip is used.
If rig-ps encounters a core which return SCP errors it now prints the error and
contiinues rather than falling over immediately.
This commit adds support for discovering and using multiple Ethernet
connections (when available) but does not (yet) feature any way to use these
connections in parallel to improve performance. A limited performance
improvement due to reduced average latency within the machine may be plausible,
however.

This commit lifts board geometry functions from commit b666e80 (part of the now
defunct non-blocking-io branch) along with the basic principles for probing the
machine for Ethernet connections.
@mossblaser mossblaser self-assigned this Jul 18, 2015
@mossblaser mossblaser added this to the 1.0 milestone Jul 18, 2015
@mossblaser mossblaser changed the title High throughput io High throughput I/O (Multiple Ethernet Connections) Jul 18, 2015
@mossblaser
Copy link
Member Author

Some preliminary results using threads to support many streams of communication: things fairly quickly become CPU bound...

Bandwidth when using more than one connection

(Code not yet committed/pushed due to lack of formal tests and cleaning...)

@mossblaser
Copy link
Member Author

...as a point of comparison, running 24 instances of rig-scp in parallel achieves something slightly over 285 MBit/s without maxing out my CPU. Either way we're off my an order of magnitude...

@mossblaser
Copy link
Member Author

This PR has now been superseded by #224; closing.

@mossblaser mossblaser closed this Feb 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple connections for boards with multiple ethernet connected chips
2 participants