M17 frame description

Hi, my name is Mark. I’ve been coding in C for 30 some odd years and I STILL get ‘=’ vs ‘==’ messed up. sigh

I’m playing around with my code from above and will have a fixed and tested version soon. Stand by.

UPDATED 2019-12-08: I swapped the order of ‘/’ and ‘.’ in the alphabet. I definitely want to keep ‘/’ in the alphabet, but I’m less sure about ‘.’

Ok, I’ve got a working system:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

// Takes an ASCII callsign in a null terminated char* and encodes it using base 40.
// Returns -1 (all Fs) if the provided callsign is longer than 9 characters, which
// would over-flow the 48 bits we have for the callsign.  log2(40^9) = 47.9
uint64_t encode_callsign_base40(const char *callsign) {
   if (strlen(callsign) > 9)
      return -1;

   uint64_t encoded = 0;
   for (const char *p = (callsign + strlen(callsign) - 1); p >= callsign; p-- ) {
      encoded *= 40;
      // If speed is more important than code space, you can replace this with a lookup into a 256 byte array.
      if (*p >= 'A' && *p <= 'Z')  // 1-26
         encoded += *p - 'A' + 1;
      else if (*p >= '0' && *p <= '9')  // 27-36
         encoded += *p - '0' + 27;
      else if (*p == '-')  // 37
         encoded += 37;
      // These are just place holders. If other characters make more sense, change these.
      // Be sure to change them in the decode array below too.
      else if (*p == '/')  // 38
         encoded += 38;
      else if (*p == '.')  // 39
         encoded += 39;
         // Invalid character, represented by 0.
         // Interesting artifact of using zero to flag an invalid character: invalid characters
         // at the end of the callsign won't show up as flagged at the recipient.  Because zero
         // also flags the end of the callsign.  (This started as a bug, losing As at the end of
         // the callsign, which is why A is no longer the beginning of the mapping array.)

         //printf("Invalid character: %c\n", *p);
         //encoded += 0;
   return encoded;

// Decodes a base40 callsign into a null terminated char*.  
// Caller MUST provide a char[10] or larger for callsign.
// Returns an empty string if the encoded callsign is larger than 40^9, 
// which would result in more than 9 characters.
char *decode_callsign_base40(uint64_t encoded, char *callsign) {
   if (encoded >= 262144000000000) {   // 40^9
      *callsign = 0;
      return callsign;

   char *p = callsign;
   for (; encoded > 0; p++) {
      *p = "xABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-/."[encoded % 40];
      encoded /= 40;
   *p = 0;
   return callsign;

void test_callsign(const char *callsign) {
   printf("Call: '%9s'  ", callsign);

   uint64_t encoded = encode_callsign_base40(callsign);
   printf("Encoded: '0x%012lx'  ", encoded);  // 12 hex characters is 48 bits. If we've over-flowed 48 bits, the string will be longer and easy to spot.

   char decoded[10];
   char *ptr = decode_callsign_base40(encoded, decoded);
   printf("Decoded: '%9s'  ", decoded);
   printf("By return value: '%9s'\n", ptr);

void main() {
   printf("Error cases:\n");

I second the notion of replacing a DMR-like ID number with an alphanumeric field for a call sign. Smitty’s method seems reasonable here.

The big issue I see with this frame format is that 128 bits of payload in an 808 bit frame is incredibly inefficient. If you were to run your audio codec at 3.2 kbits/sec as planned, you would need a total throughput of 20.2 kbits/sec. For comparison, DMR runs at a throughput of 9.6 kbits/sec and it’s delivering two channels of audio. I fear that attempting to run at such a high bit rate will degrade our weak signal performance considerably compared to alternative digital modes.

I don’t think we have to transmit full metadata every single frame. We can transmit a “metadata frame” at the beginning of the transmission, and simply assume that it applies to all subsequent frames within that transmission as well.

I also don’t think we need to run Codec2 all the way up at 3.2 kbits/sec. FreeDV’s own VHF mode uses Codec2 at 1.3 kbits/sec, and the audio samples I’ve listened to of that bitrate are just fine.

Here’s what I’m thinking:

  • Run Codec2 at 1.3 kbits/sec
  • M17 uses 4FSK at a symbol rate of 1.6 kbaud / 3.2 kbits/sec, and a bandwidth of 6.25 khz.
  • Frames are 128 bits / 64 symbols / 40ms long. 25 frames/sec.
  • Example frame types:
    AUDIO: 52 bits audio payload, 52 bits rate 1/2 error correcting code (Turbo or LDPC)
    WHOAMI: 42 bits of callsign, 24 bits latitude, 24 bits longitude, + CRC
    SMS: Send a text message to a callsign
    DATA: Send arbitrary data. Whoever receives it should have a way to interpret it
    ROAM: Information from repeater to help HTs roam between repeaters
    TICKTOCK: Periodic message from repeater announcing its callsign and useful information like the time, what talk groups are active, etc.

Frame type field of the header format should have at least, say, 4 bits.

If we do pull the metadata out into a separate frame type, we should send it periodically interspersed with payload frames. That way a receiver can pick up a transmission in the middle instead of having to wait until the end of a long-winded ham to finish blathering on before getting meta data.

You said 42 bits for callsigns, as opposed to my proposed 48. Was that on purpose? Why?

I’d like to keep the idea of both sender and receiver in the metadata. We can define non-callsign compliant IDs for things like talk groups, broadcast, etc. and still allow person-to-person conversations.

Remember that AES is a block cipher and works on 128-bit blocks. I’m really busy today and will elaborate on this topic later.

Oh, I just pulled 42 from the bit of your post where you first calculated the number of bits in a 6-character callsign, I didn’t scroll down to see 48. It doesn’t really matter.

I think a better alternative to periodically interrupting audio frames to transmit a metadata frame is to set aside an octet or two out of each frame as a “metadata channel”. The full metadata then arrives over multiple frames.

I don’t know about Poland but in the US, it is illegal to encrypt amateur radio communications, so just be aware that I will not be able to test the protocol with encryption enabled. :slight_smile:

Okay, with a 128 bit payload, we can do two 52-bit Codec2 frames, plus another 24 bits of header and metadata channel. Overall frame rate ends up at 12.5 frames/sec.

I’m not a crpytography expert, but does key information need to be included in every frame or can there just be an “encryption setup” type frame at the beginning of transmission?

Check out my last blog entry :slight_smile: https://teletra.pl/M17/blog/encryption

The FCC in the US has its own laws, cryptography in amateur radio is definitely illegal here.

AES will probably be used in CTR mode, so we would need a frame number too. We need it, so the AES counter value is kept synchronized within all recipients. I have reserved 12 bits (if I remember correctly) for that purpose. 4096*40msec is the longest TX period you can have atm, about 163 seconds.

Yep, obfuscation of any kind is illegal in the US, but that doesn’t mean we shouldn’t support it in the protocol, so long as we also support a null cipher. We CAN include authentication and message verification and stay within the US laws, though. It just requires that the payload be in clear text.

I agree with KE0WVA that we should look into different frame types for call setup and payloads. You don’t need to send the nonce, key_id, or cipher type with every packet, for example. Just set all that up once, maybe establish a session_id to identify all those parameters (hash of all cipher parameters, maybe with sender and recipient IDs too?), then only include the block counter in every message.

I do like CTR mode, though, as opposed to a chained mode like CBC. Means that if you drop a packet, you can pick up the later packets and don’t lose the entire session.

OK, I finally got some spare time.

Well, that’s a good idea. The encoded callsign could be stored in the codeplug (on an SD card).

Don’t worry about it, we are going to rearrange the frame contents anyway.

Yep, we definetly have to change that.

[quote]Frames are 128 bits / 64 symbols / 40ms long. 25 frames/sec
AUDIO: 52 bits audio payload, 52 bits rate 1/2 error correcting code[/quote]

How would you send sync info like the frame number? An the nonce value? You have to transmit that info regularly throughout the transmission. AFAIK, it’s needed for AES in CTR mode. Would you just add that to the frame? Post #5 explains it well - the setup has to be sent periodically (but not necessarily in every frame like it happens at the moment).

Yeah, that leaves some unused bits for future use.

We should draw a flowchart for that. Maybe UML or SDL, even? Well… we should describe everything using some kind of description language anyway.

For my own experiments (linking repeaters, including those using freedv) I created a 48 bit encoding for callsigns that is also ethernet compatible. See http://dmlinking.net/eth_ar.html.
It is already in use for some repeaters around the Eindhoven area.

Another thing I would like to suggest is to use an frame type identifier of 16 bit and use it to identify different protocols.
The current 3k2 codec2 sound okish, but if David comes up with a better mode based on his low bitrate work we might want to be able to switch to a different 3k2 mode without having to redo the protocol and switch automatically.

And finally I would like to suggest the to have a look at the data channel code I wrote for freedv, it allows interleaving data between voice packets (e.g. when no audio is detected) and on a high level provides ethernet frames which allows you to use basically any existing protocol on it.

I’d like to sum up all suggestions for the frame types. Encoding callsigns on 48 bits sounds great, code provided by Mark works nice.

I see one problem. Imagine that 2 users are making an encrypted QSO. Ok, VOICE frames are being transmitted every 40 msec. What should happen, if someone else (knowing the key) starts to listen? Where should we put ENCRYPTION_SYNC frame? There’s no space between VOICE frames. Also, we can’t remove 1 voice subframe (20ms of encoded voice) and just place the encryption sync there - the other half would have to be transmitted in plaintext.* Am I correct?

There will be no voice subframes in CODEC2_MODE_1300 (1300bps). The vocoder takes 320 samples as the input, not 160.

IIRC, in the US, you’re allowed to encrypt data as long as the key(s) are public. So it would be okay to test out the encryption as long the key is made public somewhere.

You’re also allowed to cryptographically sign data, since that isn’t obscuring anything.

-Dan Fay

I have edited my first post in this thread. Take a look at the PDF linked there.

16 bits is a lot. Why 16?

I suppose the trick, then, is to increase symbol rate just enough to open up gaps that you can use to transmit a header frame.

40ms of audio per frame = 25 frames per second. Increase symbol rate by 20% to make it 30 frames per second, now for every 5 frames of audio you can slip in one frame of header, metadata, encryption setup, etc. Send your encryption setup frame once per second, and somebody just tuning in will miss at most one second of audio.

Sender and receiver will both need to maintain, at minimum, a 40ms buffer of audio to cross those gaps. Total codec delay in audio transmission will be a minimum of 160 ms. I think this is acceptable.

So let’s say a symbol rate of 4.8 kbaud, for 9.6 kbits/sec throughput. That’s 160 symbols/320 bits per frame. That gives us 128 bits of payload, 128 bits ECC, 64 bits sync and header. That’s 3.84 kbits/sec of goodput, 3.2kbits of which is audio, leaving 640 bits/sec for metadata and encryption setup. I think that’s pretty reasonable.

This conversation seems to be taking the protocol in two different directions. We (read: SP5WWP since this is their baby) need to decide whether M17 is:

  1. A minimalist streaming protocol for digital voice, like FreeDV which has a 95% efficiency (1375bps CODEC data in 1450bps over-the-air bitrate: https://freedv.org/freedv-specification/), but with the addition of some features like FEC.

  2. A full, layered, packet switched network protocol that can carry nearly any arbitrary data payload. This is closer to Ethernet/IP, with all the over-head that goes along with that. Great with large payloads, horrible with lots of small packets.

It feels like SP5WWP’s original frame description was an attempt at #1, but with a lot of overhead from #2. Every packet was the same format, and everything needed to understand a given packet is encoded in that packet. This makes it very tolerant of lost data, but very inefficient of on-the-air bits.

The conversation here seems like we’re headed toward #2. We can design a very functional protocol for #2, but it will likely be a lot less efficient than #1. If #1 was SP5WWP’s original goal, then I want to make sure it’s a conscious decision to work on #2, not a design-by-committee scope-creep decision.

A little light reading: https://www.etsi.org/deliver/etsi_ts/102300_102399/10236101/01.04.05_60/ts_10236101v010405p.pdf DMR Specification. I’m not trying to recreate DMR, but I’ll bet they’ve done a lot of thinking about these particular problems too. Let’s see what we can learn from their hard work, then adjust things to optimize the M17 for our particular use case.

tl,dr of the DMR protocol, at least as it pertains to us:
[]A time slot is either sending individual packets (they call them Bursts because TDMA), OR that time slot has been established as a voice channel and it’s streaming voice data.
]“individual packets” are called “general data bursts” and are used for meta data transmission, link establishment, parameter negotiation, etc.
[]To start a voice channel, a general data burst called Link Control, or LC, is sent that contains sender, recipient, etc. (I’m unclear whether this is a REQ/ACK conversation with the other station which matches my use of DMR, or just a preamble to the voice stream which is what the documentation looks like.)
]During a voice call, each voice burst caries 2x 108bit payloads for CODEC data, plus 48 bits of “sync or embedded signaling” data. I don’t believe there’s any FEC in the voice data, or if there is it’s specified elsewhere in the document I haven’t read yet.
[]The voice payload is just a fixed frame rate of voice data, 2x 108bits per burst/packet, each burst being 27.5ms long, sent every 60ms. 2108*(1/60ms) = 3600bps of voice payload throughput.
[]However, for the sync/embedded signaling data, voice frames are grouped together 6 bursts at a time into a super frame. These 6x 48bit frames are used to retransmit the Link Control message that was used to establish the voice link in the first place, re-transmitting things like sender and recipient ID, and some other metadata. This allows a recipient to pick up a transmission in progress.
]Encryption is its own link control packet, they call it Privacy, or the Privacy Indicator, PI packet. It’s analogous to the LC packet, sent while establishing the voice channel. It does NOT look like it is resent during the stream like the LC packet is, so if you miss the PI at the beginning, you are not receiving the rest of the stream.

Obviously, DMR is a whole lot more complex than that. I’m ignoring all the time slot management, the inter-timeslot signaling, full-duplex, reverse channel, etc, etc. But assuming we want to distill our protocol down to: “what’s the simplest protocol to carry on a half-duplex voice conversation, and to be able to send some small-ish data frames for APRS-like functions, and possibly to support a low bitrate raw data stream” I think we can use the model described above, or one similar to it:

[]Define a packet switched protocol. It may be inefficient, but it will be very functional.
]Use that packet switched protocol to establish a circuit switched stream for carrying voice, or other stream based, data as efficiently as possible.

IMHO, that’s what we should build here. I’ll get thinking about the details, how to make this work without too much bloat.

Do you have a target over-the-air modulation bit rate? Can we safely get 6400 baud? 9600 baud? How much RF spectrum does that take?

Looks like CODEC2 supports bit rates from 700 to 3200. I think M17 should support the best audio quality available, so we should support at least 3200bps payload.

Let’s look at the voice stream protocol. Assume the Link has been established by some packet, I’ll borrow the name from DMR and call it Link Control. We send a Link Control packet that says “Hey, a CODEC2 3200 bps stream is about to start.” We’ll figure out those details later. For now, I want to look at just the voice stream after it’s been established.

I’m having a hard time finding anything that specifically says what the frame size is for CODEC2 3200, but assuming it uses the same 20ms frame time the other ones use, that would be a 64 bit frame. I like powers of two, so lets go with that. So we need to send a 64bit payload 50 times a second, or once every 20ms.

We don’t want to sync with every CODEC frame, that’s way too much overhead and I just don’t think it’s necessary. DMR sends about 256 bits per 60ms, and the underlying data-link handles the sync with the TDMA timing. So let’s say we group four of our 64 bit CODEC2 frames into a single 256bit payload, and send them in a single packet every 80ms.

We don’t need to keep sending the 32bit preamble, that’s only needed when first setting up an RF transmission. If we continuously stream data, we don’t need to let the amps power up, T/R switches to switch, squelch to open, etc. That’ll be part of the original Link Control packet that setup the voice stream in the first place.

We DO need the SYNC header, however. In DMR, that’s handled by the underlying framing and timing. I’ll include the 32 bits here.

I do like the idea of interleaving the link control message as additional data in the stream rather than inserting an extra packet in the stream which creates jitter, so lets add another 32 bits for a chunk of the Link Control packet.

32 bits for CRC, and 160 bits for FEC make 512 bits total. Can we arbitrarily pick how many FEC bits we use? Do we really need both CRC and FEC? Can we instead combine these into a 192 bit FEC?

So, once the voice channel has been setup, I’m seeing a 512 bit frame that looks like: SYNC: 32, Link Control: 32, 4x Voice Frames (64 each): 256, CRC: 32, FEC: 160

So long as we can send one of these 512 bit frames every 80ms, we can keep a voice channel up. That’s a 6400bps modulation rate.

Assuming a Link Control message is no more than 256 bits, it can be completely resent every 8 frames, we’ll borrow the super-frame term from DMR. So a receiver is guaranteed to be able to pick up an in-progress stream within 640ms.

Any thoughts on that?