Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 certificate format #1216

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

V2 certificate format #1216

wants to merge 12 commits into from

Conversation

nbrownus
Copy link
Collaborator

@nbrownus nbrownus commented Sep 13, 2024

This is a near complete implementation of an ipv6 enabled overlay network. There are a handful of pull requests targeting this branch in addition to a number of incomplete issues within this PR.

I have tried to highlight the main trouble spots with comments on the PR and in addition here are my internal notes:

  • nebula-cert is defaulting to mint both v1 and v2 certificates and nebula config is defaulting to transmit v1 certificates if both are present. This will lead to a situation where net new adopters will be stuck using v1 certificates. Should this be the default behavior?

  • Currently nebula wont allow you to remove a v1 certificate on reload but this means a restart is required to complete a migration to v2. A comment on this review highlights this, we should allow it.

  • Currently there are a few spots (control, ssh) where we return the certificate in use but they only return 1 cert, should we leave it returning the default or force folks to consume both? How do we indicate which one is default?

  • Every time we encounter a certificate we need to sort the networks as well as only record the union of applicable addresses. See [cert-v2] punchy-respond on an address in common with the querying host #1261

  • We need to take extra care to ensure we never allow a ipv4 mapped ipv6 address. This is not currently addressed in this PR.

  • There may be an opportunity to improve the lighthouse protocol by directly using the MarshalBinary and UnmarshalBinary functions for netip objects. This is more of a nit than anything but we should agree on this before merging.

  • nebula-cert sign help text needs to talk about primary vpn address (or the union of effective addresses) being the first one after being sorted and what that means for tunnels.

  • The firewall config still uses naming like ca-sha should we alias this to ca-fingerprint to normalize the verbiage?

  • The firewall config still uses naming like ip should we alias this to network to normalize the verbiage

  • Normalize all names for Marshal* and Unmarshal*, we have some From*/To* and others without From/To.

  • Should we use this moment to swap p256 keys to use their compressed form? Saves ~32 bytes.

Once this is merged the upgrade path should likely be:

  1. Give every host dual certs and leave the default pki.default_version to 1. Everything continues to use v1 protocols.
  2. Switch all am_relay: true hosts pki.default_version to 2. Relays can now initiate v2 tunnels, but this is an uncommon flow.
  3. Switch all non lighthouse hosts pki.default_version to 2. Lighthouses will reply to v2 hosts using v2 protocols.
  4. Switch all lighthouses pki.default_version to 2.
  5. Remove v1 certs from all hosts

cert/cert_v2.asn1 Outdated Show resolved Hide resolved
lighthouse.go Outdated Show resolved Hide resolved
@nbrownus nbrownus mentioned this pull request Oct 4, 2024
Base automatically changed from cert-interface to master October 10, 2024 23:00
q,
cache.Get(u.l),
)
r(netip.AddrPortFrom(netip.AddrFrom16(rua.Addr).Unmap(), (rua.Port>>8)|((rua.Port&0xff)<<8)), buffer[:n])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use binary to avoid endianness issues

return fmt.Errorf("error while signing with PKCS#11: %w", err)

if version == cert.Version1 {
// If we are asked to mint a v1 certificate only then we cant just ignore any v6 addresses
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We arent ignoring them

@nbrownus nbrownus force-pushed the cert-v2 branch 3 times, most recently from 9f75b80 to a09f39f Compare October 24, 2024 03:17
---------

Co-authored-by: Jack Doan <jackdoan@rivian.com>
cert/cert_v1.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/cert_v2.go Outdated Show resolved Hide resolved
cert/sign.go Outdated
if curve != t.Curve {
return nil, fmt.Errorf("curve in cert and private key supplied don't match")
}

//TODO: make sure we have all minimum properties to sign, like a public key
//TODO: we need to verify networks and unsafe networks (no duplicates, max of 1 of each version for v2 certs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which checks are we missing that we want to include?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #1266

@@ -91,40 +92,60 @@ func AddRelay(l *logrus.Logger, relayHostInfo *HostInfo, hm *HostMap, vpnIp neti
func (rm *relayManager) EstablishRelay(relayHostInfo *HostInfo, m *NebulaControl) (*Relay, error) {
relay, ok := relayHostInfo.relayState.CompleteRelayByIdx(m.InitiatorRelayIndex, m.ResponderRelayIndex)
if !ok {
rm.l.WithFields(logrus.Fields{"relay": relayHostInfo.vpnIp,
//TODO: we need to handle possibly logging deprecated fields as well
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

if msg.OldRelayFromAddr > 0 || msg.OldRelayToAddr > 0 {
v = cert.Version1

//TODO: yeah this is junk but maybe its less junky than the other options
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

if v == cert.Version1 {
peer := peerHostInfo.vpnAddrs[0]
if !peer.Is4() {
//TODO: log cant do it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

relay_manager.go Outdated
@@ -310,11 +357,11 @@ func (rm *relayManager) handleCreateRelayRequest(h *HostInfo, f *Interface, m *N
f.SendMessageToHostInfo(header.Control, 0, peer, msg, make([]byte, 12), make([]byte, mtu))
rm.l.WithFields(logrus.Fields{
//TODO: IPV6-WORK another lazy used to use the req object
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

relay_manager.go Outdated

if v == cert.Version1 {
if !h.vpnAddrs[0].Is4() {
//TODO: log it
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

relay_manager.go Outdated
@@ -360,11 +419,11 @@ func (rm *relayManager) handleCreateRelayRequest(h *HostInfo, f *Interface, m *N
f.SendMessageToHostInfo(header.Control, 0, h, msg, make([]byte, 12), make([]byte, mtu))
rm.l.WithFields(logrus.Fields{
//TODO: IPV6-WORK more lazy, used to use resp object
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

pki.go Outdated
return p.cs.Load()
}

func (p *PKI) GetCAPool() *cert.CAPool {
return p.caPool.Load()
// TODO: We should remove this
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

pki.go Outdated
return p.cs.Load().GetDefaultCertificate()
}

// TODO: We should remove this
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

}

} else if currentState.v1Cert != nil {
//TODO: we should be able to tear this down
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

p.cs.Store(cs)
p.cs.Store(newState)

//TODO: newState needs a stringer that does json
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

pki.go Outdated Show resolved Hide resolved
return nil, util.NewContextualError("v1 and v2 curve are not the same, ignoring", nil, nil)
}

//TODO: make sure v2 has v1s address
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve TODO

overlay/tun_windows.go Outdated Show resolved Hide resolved
@nbrownus nbrownus marked this pull request as ready for review October 29, 2024 03:48
if ok {
whereToPunch = newDest
} else {
//TODO this means the destination will have no addresses in common with the punch-ee
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolve

@JackDoanRivian
Copy link
Contributor

JackDoanRivian commented Dec 12, 2024

TODO: nebula-cert verify only checks the first cert -- see #1291

@maggie44
Copy link

maggie44 commented Dec 29, 2024

This PR no longer exports the certificate structs; neither v1 or v2:

type certificateV2 struct {
	details detailsV2

	// RawDetails contains the entire asn.1 DER encoded Details struct
	// This is to benefit forwards compatibility in signature checking.
	// signature(RawDetails + Curve + PublicKey) == Signature
	rawDetails []byte
	curve      Curve
	publicKey  []byte
	signature  []byte
}

type detailsV2 struct {
	name           string
	networks       []netip.Prefix
	unsafeNetworks []netip.Prefix
	groups         []string
	isCA           bool
	notBefore      time.Time
	notAfter       time.Time
	issuer         string
}
type certificateV1 struct {
	details   detailsV1
	signature []byte
}

type detailsV1 struct {
	name           string
	networks       []netip.Prefix
	unsafeNetworks []netip.Prefix
	groups         []string
	notBefore      time.Time
	notAfter       time.Time
	publicKey      []byte
	isCA           bool
	issuer         string

	curve Curve
}

Before:


type NebulaCertificate struct {
	Details   NebulaCertificateDetails
	Signature []byte

	// the cached hex string of the calculated sha256sum
	// for VerifyWithCache
	sha256sum atomic.Pointer[string]

	// the cached public key bytes if they were verified as the signer
	// for VerifyWithCache
	signatureVerified atomic.Pointer[[]byte]
}

type NebulaCertificateDetails struct {
	Name      string
	Ips       []*net.IPNet
	Subnets   []*net.IPNet
	Groups    []string
	NotBefore time.Time
	NotAfter  time.Time
	PublicKey []byte
	IsCA      bool
	Issuer    string

	// Map of groups for faster lookup
	InvertedGroups map[string]struct{}

	Curve Curve
}

It is going to be a big breaking change and inconvenience for anyone who has used and uses the libraries. Would prefer to keep these exported. I see that it is a breaking change PR, but it should at least allow restoring the same functionality as was available before.

@maggie44
Copy link

I see now it’s switched to a factory approach, with TBSCertificate that is exported. 👍

A few thoughts to consider:

A helper function for converting a Certificate back to a TBSCertificate would be useful to complete the factory loop and allow manipulation of existing signed certificates (as unsigned ones).

There are also some Marshal functions like Marshal(), but without a corresponding Unmarshal(). I imagine most people are using pem encoded certs, but if Marshal() is exposed there should probably be a corresponding Unmarshal().

cert/cert_v2.go Outdated
Comment on lines 68 to 69
networks []netip.Prefix
unsafeNetworks []netip.Prefix

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mentioned this as a reply to a resolved comment already, but in case it's folded and not seen: I think the name networks and unsafeNetworks are confusing. The former one is an address + its on-link network prefix, while the latter is just a network prefix. For the former, only a single address is routable to the host, while all addresses in the network is routable to the host for the latter.

I would propose the name to be addresses (I think it's fine even though this expects an CIDR, as the input is same as what ip address add command expects) which clearly indicate that the address part of the network prefix is significant, and routable_networks (or maybe unsafe_routable_networks) which clearly indicates that these are CIDRs for routing only.

Copy link
Collaborator

@johnmaguire johnmaguire Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share concerns of confusion with the naming here. I'm also comfortable sticking with the old name which at least creates no new confusion (especially with respect to existing documentation and discussions): #1216 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also be open to the name addresses for the networks field. I think the name networks came up when we were considering it in the context of a CA certificate? Using that same field name for two very-similar-but-different purposes muddies the waters a little bit, but the intent was that it shows which networks (as V2 certs let you have more than one!) the host participates in, as well as the actual address assignment on non-CA certs. However, the ip address add argument is very persuasive and probably much more obvious to new users.

Naming unsafeNetworks is a little trickier. When the addresses field is named networks, I think it's a pretty logical name, but unsafeAddresses doesn't really work. I'd propose something like unsafeRouterFor: the prefix unsafe links it mentally to unsafe_routes in the config, and I think RouterFor shows that the field expresses a capability, rather than something that might affect how "regular" Nebula traffic is routed.

Copy link

@maggie44 maggie44 Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think addresses places emphasis on the use of the ip; networks on the prefix use. But ultimately both are used and the content passed in is CIDRs? Perhaps better to reflect what it is and what is passed in by the user rather than the things it’s used for (similar to before where it was called Ips but CIDRs would probably be more accurate).

The bigger challenge before was having to manually change the ip when using ParseCIDR because it returned the base address instead of the provided IP, and the confusion of the certificate field ‘subnets’. Both of those are no longer an issue so bounds better.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Jack said, the issue is that the field is used for dual purposes. For a non-CA cert, addresses and unsafeRouterFor makes more sense, but for CA cert, networks and unsafeNetworks make more sense. I would say it makes more sense to cater for the non-CA cert case since there're more of them, or perhaps they can have different names for different nebula-cert commands?

JackDoanRivian and others added 2 commits January 6, 2025 15:07
* enforce certificate correctness in TBSCertificate.SignWith

* check length, not nil

* Address review comments

* github hates me

---------

Co-authored-by: Nate Brown <nbrown.us@gmail.com>
Co-authored-by: Jack Doan <me@jackdoan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants