Does anyone know how exactly they backdoored the machines?
Yes and no, what we do know is that some of the machines had to be both secure and insecure, so that they could interoperate without raising any "red flags" by not interworking with secure machines...
We know from the book "Spy Catcher" written by Peter Wright published in the early 1980's that one method was to supply an "algorithmicaly secure machine" but with an "acoustic side channel" that leqked key information in it. Basically MI5 had gained audio access via an "infinity device" to the "Crypto Cell" at the Egyptian Embassy in London. They could thus hear the mechanical cipher machine running. Whilst it did not give the "key" what it did do was give the "wheel" starting points, turn over points, and which were rotated at any time. This reduced the "attack space" GCHQ had to deal with from "months to minutes".
As for effecting the "key stream" back in WWII the stratigic not tactical German high level cipher machine was the Lorenz teletype cipher machine. It used 12 cipher wheels with "movable lugs" on the wheel periphery, that caused a "key stream" to be built by XORing the lug positions. The wheels sizes were essentialy "prime to each other" thus whilst they were only 30-60 steps each their combined sequence was the multiple of their step sizes which was immense. At Bletchly Park the traffic from these machines was codenamed "Fish" and the machine "Tunny". The work of two men broke the machine sight unseen due to a mistake made by a German operator. There are various pages up on the web that will give you as little or as much information on it as you would like.
But what you need to remember is that,
1, The failings of the Lorenz machine are shared by many other machine ciphers not just mechanical ones.
2, Virtually all machine ciphers pre AES have both strong and weak keys with a range in between.
The US Field Cipher based on the Boris Haglin coin counting mechanism suffered from the second issue, in fact it had rather more weak keys than strong. This was not a problem for the US military as they "Issued key scheduals centrally" thus knowing what were strong keys and what were weak keys they only ever used the strong keys. The knowledge of weak and strong was as far as we can tell worked out by William F. Friedman, and it was deliberatly implemented as such by him. That is, the big weakness of any field cipher machine is the enemy will capture it and may well end up using it or copy it's design to make their own machines (see the history of Enigma type "rotor" machines to see that in action).
Thus the reasoning was either the enemy is smart and will know about the strong keys and weak keys in which case nothing won or lost. However if they do not and assume all keys are the same, then your cryptanalysis team has just been given a great big bonus to make thier lifes easier. What was not known then and still not widely recognised was the British invention of Traffic Analysis in all it's forms and the huge card file database they used with it. This enabled them to identify specific traffic circuits and individual operators without the use of cryptanalysis. Which gave not just vast amounts of "probable plaintext" but also "probable cillies" and other bad operator habits. All of which made breaking of even strong keys very very much easier. Thus traffic under weak keys becomes a leaver to put in the cracks of strong keys...
What is also known is that Crypto AG supplied customers not just with the actual crypto machines but a whole lot of key generation support... This was in the form of manuals and machines, all of which pushed Crypto AG customers into producing either "weak key scheduals" or "known key scheduals" but the actual encryption machines worked identically to those who used "secure key scheduals" thus were fully compatible, so no red flags raised.
The thing that we forget these days is that designing crypto kit is actually a hard process. Whilst it's easy to come up with complex algorithms, they are almost impossible to implement in a mechanical system that is reliable in use. Likewise for their pencil and paper analogs. Also they are eye wateringly expensive to make. If you are ever lucky enough to get your hands on just a single Enigma rotor you will see it is superbly engineered from many many parts each one of which requires a great deal of engineering thus there are hundreds of hours of work in each Enigma machine even though the outer wooden box might look crude to modern eyes. Thus only fairly simple algorithms got implemented based on minor variations to odometer or coin counting mechanisms.
Untill DES came along nearly all "electronic" cipher machines were based on simple circuits like shift registers and SR latches. In most respects many were just simple copies of mechanical cipher algorithms. So the likes of a Lorenz wheel became a "ring counter with reset" and the lugs replaced by a "plug board" the algorithm remained the same, along with all it's weaknesses... Even when put in software in 4 and 8 bit CPU systems or later micro controlers those old defective mechanical algorithms came along as "counters mod N" driving "lookup tables"... In part this happened due to "inventory costs" if you've invested a fortune in mechanical cipher systems you want your new shiny electronic systems to be compatible, likewise those that are CPU based. It's the same old "legacy issue" that almost always works more for your enemy than it does for your security.
But acoustic side channels are known to be not the only ones. Even theoreticaly secure One Time Pad/Tape systems are practically insecure when implemented in machine form. The UK high level super encipherment machine known as Rockex used by the Diplomatic Wireless Service (DWS) and designed by Canadian engineer "Pat" Bailey suffered from this as I mentioned years ago on this blog. In essence the Pad/Tape "additive" was done in a circuit using Post Office Type 600 relays. Even though the open to close times could be adjusted there was always a slight time asymmetry that got out onto the telephone pair used to connect to the telex network. This time asymmetry could be used to determine the "addative" thus strip it off leaving the plaintext...
One solution to this is to use a "shift register" or secondary relay that "reclocked" the data signal so that the time asymmetry seen on the line was not that of the relay doing the encipherment, but the time asymmetry of the reclocking relay. In essence the contacts of the reclocking relay were "open" during the critical time period of the encipherment relay changed state.
Which in theory should have made it secure... But open relay contacts like open switch contacts can be "jumped" because in reality they are small value capacitors. This is what the "infinity device" was all about. It enabled you to put a high frequency signal on the telephone pair that would see the encryption relay change state through the open contacts of the reclocking relay... So you needed to add extra circuitry to prevent the time based side channel from the encryption relay being seen on the line. Thus leaving out that extra circuitry made a very secure system nearly totaly insecure to anyone with the appropriate device in line, yet it retained total data level compatability with it's secure counterparts, so again no "red flag" waved.