A few weeks ago, I made this post: Don’t Touch My EtherChannels! – CML 2.3 e1000 MAC Address Pool Bug
Well, I am happy to report I found a workaround! It’s quite the simple one; quite frankly, I am surprised I didn’t find it earlier! I will outline the rationale for this workaround.
If the word “rationale” makes you snore louder than a polar bear (do polar bears snore? Who knows!), feel free to skip down to the section titled “Damn you, give me my workaround already!!!” …fine, the section’s actually just called “Workaround Implementation”, but imagine if I had the gall to name it the former title!
Why does this workaround work?
This workaround is predicated on how CML allocates MAC addresses to the virtual nodes in labs. I was casually scrolling through the CML FAQs one late evening, trying to find some interesting details. Don’t you dare tell me you don’t do the same! …it’ll be the truth, and I don’t have time for the blasted truth, darn it!
CML allocates MAC addresses using a series of pre-defined pools; there are 8 different pools. The purpose of having 8 pools (numbered 0-7) seems to be in cases where you have bridged multiple CML nodes together. I would assume this is for an enterprise application, unless everyone has purchased two CML licenses to run two CML VMs simultaneously except for me. I wouldn’t put it past you crazy folks!
If you have a lab that is running across multiple CML nodes, then there is some likelihood that there will be a MAC address conflict. Alternatively, as the FAQ section included in the screenshot below stipulates, you will run into the same MAC address duplication conflict if the nodes in both lab instances are connected to the same Layer 2 segment on the external network through bridge-mode External Connectors.
For context, as I was writing my original post, I wasn’t exactly comfortable with the implication that there was a full-on bug in the e1000 driver stack in its entirety. e1000 is generally what’s known as “settled code”, which means that it is very unlikely that e1000 was broken. Nevertheless, because I had no way to test another network stack driver and nodes that operated using vmxnet3 were completely unaffected by the bug in virtual node MAC address allocation, I had limited insight to base my hypothesis on. The information that I did have seemed to suggest that it was a bug with e1000.
After reading this FAQ document, I figured there might be a chance that there was a bug with the default MAC address pool itself (#0), instead of the entire allocation logic. I tested this by changing the MAC address pool, since I figured that if it didn’t work, all I would lose is my time…so nothing worthwhile! 🙂
And…it worked! Let’s talk about how to do this.
If you read the section above, you would have seen the workaround! If you didn’t read it above, you missed it! Shame on you…but I’m generous, so I’ll detail how to do it below!
The affected MAC address pool is the zeroth pool (the default pool); this is the only pool that doesn’t properly randomize the first three bits of the fourth byte for the virtual nodes and sets it to 000 for all nodes (to my knowledge). The other pools, however, should work. I have only confirmed pool #3 to be functional; while I assume that the other pools from 1-7 will work, your mileage may vary when using the other pools.
Configuring the MAC address pool is simple enough, but there is no easy way to configure this from the GUI or CLI; the official way to configure this is from CML’s REST API.
Issuing this API call to the CML controller via the URI https://<cml url>/api/v0/system/mac_address_block/<block number> with the PATCH method will change the configuration of the MAC address block on the controller. Note that there is no way to issue this configuration on a lab-by-lab basis; because it is a controller configuration, all labs on the controller will be affected. In addition, it is important to remember that the CML fabric allocates MAC addresses upon initial runtime. As such, you must wipe any nodes that you want to be affected by the changed MAC address pool.
If you are unfamiliar/uncomfortable with working with REST APIs, there are a number of ways to issue this API call. An easy way is to use the Postman utility, a GUI-centric application that allows you to issue API calls. You can download Postman here. This is a great tool for working with REST APIs, so it is a worthwhile application to have on hand, if you wish to work towards building an automation skillset!
If you are more comfortable with CLI applications, you can either use cURL or the httpie package to issue these calls. Sample commands are included below that you can adapt for your environment.
curl -k -X PATCH https://cml-url/api/v0/mac_address_block/blocknum -H "Authorization: Bearer tokenhere"
http --verify=no PATCH https://cml-url/api/v0/mac_address_block/blocknum 'Authorization: Bearer tokenhere'
To obtain the bearer token for authorizing your API calls, you will have to use the authentication URI (api/v0/authenticate) and pass it your CML username and password (NOT Cockpit/Linux username/password) in the body, as documented in the CML API documentation.
You can verify this by issuing a GET request for
api/v0/mac_address_block (without the blocknum appended) and it will return the MAC address block in use.
Testing The Changes
How do we know our changes worked? We test them, of course! You can do so by just deploying two IOSvL2 switches and attempting to form an EtherChannel between them. You can also run the show spanning-tree command to compare the base MAC addresses and determine whether the CML controller is properly allocating/randomizing the MAC addresses for the virtual nodes.
This is what it looks like when the MAC addresses are NOT properly randomized (on block #0). I pulled this from my previous blog post on this issue.
Sw2#show span | i (Root|Bridge) ID|Address Root ID Priority 32769 Address 5254.0000.0000 Bridge ID Priority 32769 (priority 32768 sys-id-ext 1) Address 5254.0000.0004
Notice the bolded portions. This is the first four bits of the remaining 21 bits of the MAC address, which, according to Cisco, are supposed to be randomized. This appears to be sequential instead of a randomized sequence.
By contrast, notice what happens to the same output when the MAC address block is changed away from 0. In this example, I am using MAC address block 3.
Sw1#show span | i (Root|Bridge) ID|Address Root ID Priority 32769 Address 5254.0067.caf4 Bridge ID Priority 32769 (priority 32768 sys-id-ext 1) Address 5254.006f.18ff
You’ll notice that, while the first three bits of the fourth byte are still the same (011), the remaining 21 bits are properly randomized. This results in the second hex character of the fourth byte (7 and f respectively) being different between the two switches. Because features like LACP/PAgP SysID and SVI MAC addresses use the first four bytes of the MAC address in their entirety, this new configuration works to prevent duplications between devices that hinder the operation of these features.
One More Thing
Life is like a math problem. Even when someone tells you it’s simple…you know they’re f***ing liars.
There’s always one more thing – because nothing is ever that simple! This configuration will work…until you restart your controller, at which point your CML controller will gladly reset its MAC address block to the default (0) and force you through this same ordeal…all…over…again…
Man, CML is cruel! Why…won’t…it…just…let…me…have…my…ETHERCHANNELS IN PEACE?!?! >_<
This patch is merely temporary and only remains in effect in the current system state – it will reset after a controller reboot. While resetting it is as simple as running another call in Postman (or, if you want to be fancy, a Python script), nobody has time for that!
In the next (and hopefully, final) blog post in this (completely unintended) series, I will outline how I plan to solve this unacceptable deficiency. It will involve bashing CML!
And…now the end!
In this post, we outlined a solution/workaround to the problem I demonstrated in my last blog post, concerning a bug in the allocation of MAC addresses in the default e1000 MAC address pool. This solution entails issuing a simple patch request to the CML controller to correct a configuration in the controller networking, averting the bug entirely by using an unaffected MAC address pool. After verifying the configurations to ensure that the MAC address bug has actually been alleviated by this workaround, the mood was turned ever so slightly bittersweet with a discussion of the impermanence of our current solution.
I should also note that it seems that the CML development team has also picked up on this bug and has seen the write-up that I have included in the previous post. Based on a reply from what I assume to be someone fairly involved in CML’s development, it seems fairly sure that this will be resolved in an upcoming release of CML.
Despite that, however, this isn’t the last CML has heard of me! I promise you, I will be back with a vengeance! Let’s automate this thing.