M17 frame description

Still a subject to change!

Framing description and ECC scheme (ODF) https://m17project.org/files/download.php?id=32&token=eK3ZQVBofgkiEkMy0StSa1vxN5xvoFTA

Extract (stream mode):

Frame generator for SDR (eg. HackRF), sends null data (TODO: fill it with Codec2 data):

GNU Radio Companion .grc file for M17 4FSK modulator with RRC filtering:

Moving from email:

I request that you encode the call-sign directly into the headers instead of using an arbitrary 24 bit ID. One of the ham-unfriendly things in DMR is the need to register your call-sign to an arbitrary number with a third party. It’s literally a 1:1 mapping, so it doesn’t buy us anything instead of creating a gate-keeper for getting onto the network. Which seems a bit antithetical to M17.

Instead, I propose you encode the call-sign directly into the headers. I propose using 48 bits for each, the sender and receiver. With 48 bits, you can encode 9 characters in a 40 character alphabet. [EDIT: I had sample code here, but it was broken, so I removed it. See two posts later for much better code.]

The longest call-sign in the US is 6 characters, but I’m not sure how universal that is. If we assume 8 characters, that takes log2(36^8) = 41.4 bits. Adding two more encodable characters takes log2(38^8) = 41.98 bits, so basically free.

If you like byte boundaries in the header, 48 bits can encode 9 characters in base 38 (log2(38^9) = 47.23. You could even add more special characters for an alphabet of 40 characters, 9 characters in the call sign and still fit it in 48 bits: log2(40^9) = 47.9. Hence the code above.

Does does mean doubling the header space used by source and destination IDs, but you’ve got 80 bits of “RESERVED” space in the current header. Not sure if you already had plans for that or not.

I literally wrote that code in the browser text box for the forum. I haven’t actually run it or even compiled it, so it’s probably fraught with bugs. But it should be enough to give you an idea of what I mean. Like, thinking about it now, I’m pretty sure my code above will reverse the callsign between encode and decode. But, like I said, you get the idea.

Hi, my name is Mark. I’ve been coding in C for 30 some odd years and I STILL get ‘=’ vs ‘==’ messed up. sigh

I’m playing around with my code from above and will have a fixed and tested version soon. Stand by.

UPDATED 2019-12-08: I swapped the order of ‘/’ and ‘.’ in the alphabet. I definitely want to keep ‘/’ in the alphabet, but I’m less sure about ‘.’

Ok, I’ve got a working system:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

// Takes an ASCII callsign in a null terminated char* and encodes it using base 40.
// Returns -1 (all Fs) if the provided callsign is longer than 9 characters, which
// would over-flow the 48 bits we have for the callsign.  log2(40^9) = 47.9
uint64_t encode_callsign_base40(const char *callsign) {
   if (strlen(callsign) > 9)
      return -1;

   uint64_t encoded = 0;
   for (const char *p = (callsign + strlen(callsign) - 1); p >= callsign; p-- ) {
      encoded *= 40;
      // If speed is more important than code space, you can replace this with a lookup into a 256 byte array.
      if (*p >= 'A' && *p <= 'Z')  // 1-26
         encoded += *p - 'A' + 1;
      else if (*p >= '0' && *p <= '9')  // 27-36
         encoded += *p - '0' + 27;
      else if (*p == '-')  // 37
         encoded += 37;
      // These are just place holders. If other characters make more sense, change these.
      // Be sure to change them in the decode array below too.
      else if (*p == '/')  // 38
         encoded += 38;
      else if (*p == '.')  // 39
         encoded += 39;
         // Invalid character, represented by 0.
         // Interesting artifact of using zero to flag an invalid character: invalid characters
         // at the end of the callsign won't show up as flagged at the recipient.  Because zero
         // also flags the end of the callsign.  (This started as a bug, losing As at the end of
         // the callsign, which is why A is no longer the beginning of the mapping array.)

         //printf("Invalid character: %c\n", *p);
         //encoded += 0;
   return encoded;

// Decodes a base40 callsign into a null terminated char*.  
// Caller MUST provide a char[10] or larger for callsign.
// Returns an empty string if the encoded callsign is larger than 40^9, 
// which would result in more than 9 characters.
char *decode_callsign_base40(uint64_t encoded, char *callsign) {
   if (encoded >= 262144000000000) {   // 40^9
      *callsign = 0;
      return callsign;

   char *p = callsign;
   for (; encoded > 0; p++) {
      *p = "xABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-/."[encoded % 40];
      encoded /= 40;
   *p = 0;
   return callsign;

void test_callsign(const char *callsign) {
   printf("Call: '%9s'  ", callsign);

   uint64_t encoded = encode_callsign_base40(callsign);
   printf("Encoded: '0x%012lx'  ", encoded);  // 12 hex characters is 48 bits. If we've over-flowed 48 bits, the string will be longer and easy to spot.

   char decoded[10];
   char *ptr = decode_callsign_base40(encoded, decoded);
   printf("Decoded: '%9s'  ", decoded);
   printf("By return value: '%9s'\n", ptr);

void main() {
   printf("Error cases:\n");

I second the notion of replacing a DMR-like ID number with an alphanumeric field for a call sign. Smitty’s method seems reasonable here.

The big issue I see with this frame format is that 128 bits of payload in an 808 bit frame is incredibly inefficient. If you were to run your audio codec at 3.2 kbits/sec as planned, you would need a total throughput of 20.2 kbits/sec. For comparison, DMR runs at a throughput of 9.6 kbits/sec and it’s delivering two channels of audio. I fear that attempting to run at such a high bit rate will degrade our weak signal performance considerably compared to alternative digital modes.

I don’t think we have to transmit full metadata every single frame. We can transmit a “metadata frame” at the beginning of the transmission, and simply assume that it applies to all subsequent frames within that transmission as well.

I also don’t think we need to run Codec2 all the way up at 3.2 kbits/sec. FreeDV’s own VHF mode uses Codec2 at 1.3 kbits/sec, and the audio samples I’ve listened to of that bitrate are just fine.

Here’s what I’m thinking:

  • Run Codec2 at 1.3 kbits/sec
  • M17 uses 4FSK at a symbol rate of 1.6 kbaud / 3.2 kbits/sec, and a bandwidth of 6.25 khz.
  • Frames are 128 bits / 64 symbols / 40ms long. 25 frames/sec.
  • Example frame types:
    AUDIO: 52 bits audio payload, 52 bits rate 1/2 error correcting code (Turbo or LDPC)
    WHOAMI: 42 bits of callsign, 24 bits latitude, 24 bits longitude, + CRC
    SMS: Send a text message to a callsign
    DATA: Send arbitrary data. Whoever receives it should have a way to interpret it
    ROAM: Information from repeater to help HTs roam between repeaters
    TICKTOCK: Periodic message from repeater announcing its callsign and useful information like the time, what talk groups are active, etc.

Frame type field of the header format should have at least, say, 4 bits.

If we do pull the metadata out into a separate frame type, we should send it periodically interspersed with payload frames. That way a receiver can pick up a transmission in the middle instead of having to wait until the end of a long-winded ham to finish blathering on before getting meta data.

You said 42 bits for callsigns, as opposed to my proposed 48. Was that on purpose? Why?

I’d like to keep the idea of both sender and receiver in the metadata. We can define non-callsign compliant IDs for things like talk groups, broadcast, etc. and still allow person-to-person conversations.

Remember that AES is a block cipher and works on 128-bit blocks. I’m really busy today and will elaborate on this topic later.

Oh, I just pulled 42 from the bit of your post where you first calculated the number of bits in a 6-character callsign, I didn’t scroll down to see 48. It doesn’t really matter.

I think a better alternative to periodically interrupting audio frames to transmit a metadata frame is to set aside an octet or two out of each frame as a “metadata channel”. The full metadata then arrives over multiple frames.

I don’t know about Poland but in the US, it is illegal to encrypt amateur radio communications, so just be aware that I will not be able to test the protocol with encryption enabled. :slight_smile:

Okay, with a 128 bit payload, we can do two 52-bit Codec2 frames, plus another 24 bits of header and metadata channel. Overall frame rate ends up at 12.5 frames/sec.

I’m not a crpytography expert, but does key information need to be included in every frame or can there just be an “encryption setup” type frame at the beginning of transmission?

Check out my last blog entry :slight_smile: https://teletra.pl/M17/blog/encryption

The FCC in the US has its own laws, cryptography in amateur radio is definitely illegal here.

AES will probably be used in CTR mode, so we would need a frame number too. We need it, so the AES counter value is kept synchronized within all recipients. I have reserved 12 bits (if I remember correctly) for that purpose. 4096*40msec is the longest TX period you can have atm, about 163 seconds.

Yep, obfuscation of any kind is illegal in the US, but that doesn’t mean we shouldn’t support it in the protocol, so long as we also support a null cipher. We CAN include authentication and message verification and stay within the US laws, though. It just requires that the payload be in clear text.

I agree with KE0WVA that we should look into different frame types for call setup and payloads. You don’t need to send the nonce, key_id, or cipher type with every packet, for example. Just set all that up once, maybe establish a session_id to identify all those parameters (hash of all cipher parameters, maybe with sender and recipient IDs too?), then only include the block counter in every message.

I do like CTR mode, though, as opposed to a chained mode like CBC. Means that if you drop a packet, you can pick up the later packets and don’t lose the entire session.

OK, I finally got some spare time.

Well, that’s a good idea. The encoded callsign could be stored in the codeplug (on an SD card).

Don’t worry about it, we are going to rearrange the frame contents anyway.

Yep, we definetly have to change that.

[quote]Frames are 128 bits / 64 symbols / 40ms long. 25 frames/sec
AUDIO: 52 bits audio payload, 52 bits rate 1/2 error correcting code[/quote]

How would you send sync info like the frame number? An the nonce value? You have to transmit that info regularly throughout the transmission. AFAIK, it’s needed for AES in CTR mode. Would you just add that to the frame? Post #5 explains it well - the setup has to be sent periodically (but not necessarily in every frame like it happens at the moment).

Yeah, that leaves some unused bits for future use.

We should draw a flowchart for that. Maybe UML or SDL, even? Well… we should describe everything using some kind of description language anyway.

For my own experiments (linking repeaters, including those using freedv) I created a 48 bit encoding for callsigns that is also ethernet compatible. See http://dmlinking.net/eth_ar.html.
It is already in use for some repeaters around the Eindhoven area.

Another thing I would like to suggest is to use an frame type identifier of 16 bit and use it to identify different protocols.
The current 3k2 codec2 sound okish, but if David comes up with a better mode based on his low bitrate work we might want to be able to switch to a different 3k2 mode without having to redo the protocol and switch automatically.

And finally I would like to suggest the to have a look at the data channel code I wrote for freedv, it allows interleaving data between voice packets (e.g. when no audio is detected) and on a high level provides ethernet frames which allows you to use basically any existing protocol on it.

I’d like to sum up all suggestions for the frame types. Encoding callsigns on 48 bits sounds great, code provided by Mark works nice.

I see one problem. Imagine that 2 users are making an encrypted QSO. Ok, VOICE frames are being transmitted every 40 msec. What should happen, if someone else (knowing the key) starts to listen? Where should we put ENCRYPTION_SYNC frame? There’s no space between VOICE frames. Also, we can’t remove 1 voice subframe (20ms of encoded voice) and just place the encryption sync there - the other half would have to be transmitted in plaintext.* Am I correct?

There will be no voice subframes in CODEC2_MODE_1300 (1300bps). The vocoder takes 320 samples as the input, not 160.

IIRC, in the US, you’re allowed to encrypt data as long as the key(s) are public. So it would be okay to test out the encryption as long the key is made public somewhere.

You’re also allowed to cryptographically sign data, since that isn’t obscuring anything.

-Dan Fay

I have edited my first post in this thread. Take a look at the PDF linked there.

16 bits is a lot. Why 16?

I suppose the trick, then, is to increase symbol rate just enough to open up gaps that you can use to transmit a header frame.

40ms of audio per frame = 25 frames per second. Increase symbol rate by 20% to make it 30 frames per second, now for every 5 frames of audio you can slip in one frame of header, metadata, encryption setup, etc. Send your encryption setup frame once per second, and somebody just tuning in will miss at most one second of audio.

Sender and receiver will both need to maintain, at minimum, a 40ms buffer of audio to cross those gaps. Total codec delay in audio transmission will be a minimum of 160 ms. I think this is acceptable.

So let’s say a symbol rate of 4.8 kbaud, for 9.6 kbits/sec throughput. That’s 160 symbols/320 bits per frame. That gives us 128 bits of payload, 128 bits ECC, 64 bits sync and header. That’s 3.84 kbits/sec of goodput, 3.2kbits of which is audio, leaving 640 bits/sec for metadata and encryption setup. I think that’s pretty reasonable.

This conversation seems to be taking the protocol in two different directions. We (read: SP5WWP since this is their baby) need to decide whether M17 is:

  1. A minimalist streaming protocol for digital voice, like FreeDV which has a 95% efficiency (1375bps CODEC data in 1450bps over-the-air bitrate: https://freedv.org/freedv-specification/), but with the addition of some features like FEC.

  2. A full, layered, packet switched network protocol that can carry nearly any arbitrary data payload. This is closer to Ethernet/IP, with all the over-head that goes along with that. Great with large payloads, horrible with lots of small packets.

It feels like SP5WWP’s original frame description was an attempt at #1, but with a lot of overhead from #2. Every packet was the same format, and everything needed to understand a given packet is encoded in that packet. This makes it very tolerant of lost data, but very inefficient of on-the-air bits.

The conversation here seems like we’re headed toward #2. We can design a very functional protocol for #2, but it will likely be a lot less efficient than #1. If #1 was SP5WWP’s original goal, then I want to make sure it’s a conscious decision to work on #2, not a design-by-committee scope-creep decision.

A little light reading: https://www.etsi.org/deliver/etsi_ts/102300_102399/10236101/01.04.05_60/ts_10236101v010405p.pdf DMR Specification. I’m not trying to recreate DMR, but I’ll bet they’ve done a lot of thinking about these particular problems too. Let’s see what we can learn from their hard work, then adjust things to optimize the M17 for our particular use case.