I’ve just finished implementing M17 on the MMDVM, although it is almost completely untested in any meaningful way. I have included a network client for the reflector protocol, but it’s hardcoded in the ini file currently, I will probably add a simple minded gateway with parrot at some point. I’ll also add a gateway into APRS-IS once there is a standard for APRS/GPS data for M17.
One area that bothers me (apart from being able to distinguish between the link setup and normal data) is the handling of fragment LICH data.
You need to receive five frames to be able to recover the full LICH from the fragments. As an aside, I assume that an FN=0 is the first fragment, FN=1 is the second fragment, and then continuing in a round-robin fashion? This needs documenting.
My issue is with the integrity of the received fragment LICH data. In cases where it is needed for late entry, the overall checksum can be used to determine if you have a complete uncorrupted LICH, which is fine as far as it goes. The problem comes when receiving the fragments, some may have bit errors and you don’t know which they are, so you try and rebuild the FICH with faulty data, and it fails. This could keep happening as you don’t know which fragment is good or not. Would it not be better to dispense with the overall checksum on the LICH (saving 16-bits) and instead have a short checksum on each LICH fragment? That way we can be sure that each fragment is not corrupted, and can be used for the overall LICH without problems.
If that is not good enough, then maybe keep the overall checksum in the LICH and have the LICH spread over six frames, or more, instead of five to allow for the extra space for a local fragment checksum as above.
CRC isn’t my thing, but each frame has its own CRC, right? So I thought the overall frame checksum not validating would indicate which LICH chunks were corrupt - so we’d get chunks 2,3,4,5,1; but 3 has a bad CRC check, so we have to wait until we get that chunk again to reconstruct.
It’s not clear about which fields are covered by the CRC. If it covers the encoded LICH fragment, that is great. That addresses my issue, and it means that I need to update my implementation also.
Just had another look at specification, “Fig 6. An overview of the forward dataflow” to be precise, and it shows that the CRC only covers the FN and payload, not the fragment LICH. So my point still stands.
The entire LICH is covered with a CRC. You need to decode all 5 LICH fragments and assemble them to check the CRC.
There are currently two ways to handle assembling the LICH.
grab 5 frames in sequence, check the CRC, if it does not match rotate the frames up to 5 time, computing the CRC each time.
keep the last 5 LICH frames received and compute the CRC. If you don’t have a valid CRC, discard oldest frame and append the next one. Repeat until you get a valid CRC.
Or you could do some combination of 1 & 2.
The LICH has only incidental correlation with FN. The only requirement is that the LICH frames are sent in sequence and the FN increments sequentially from 0. LICH frames are not numbered. They only have logical ordering.
A somewhat parallel discussion occurring on github
Two potential issues have been identified.
CRC check can’t be checked until 5 LICH fragments have been received.
FN is under the payload CRC and really isn’t useful in reassembling LICH
Item 1 may result in delay in joining, but not necessarily a major flaw. If instead of a full LICH CRC we has a smaller CRC on each chunk, then it does improve likelihood of decoding in adverse situations, but I don’t think the reduction in delay will be substantial. The Golay coding allows for correcting 3 bit errors and detecting up to 7 bit errors (wikipedia), thus already providing quite good protection and ability to reject a chunk.
As pointed out above by G4KLX and WX9O and above and mobilinkd on github (maybe the same people), brute force seems the only way to reassemble at the moment since FN can’t really be used without decoding the full payload. One option I thought of was changing the Golay to [23,12,7] and freeing 4 bits for each chunk that can be used to indicate the start of LICH. We’d want this to be coded but with 4 bits, the only real option is repetition coding.
Can I ask why there isn’t any relationship between the FN and the LICH fragments? It seems logical to me to have the LICH fragment number connected to the FN to make for simpler processing. It doesn’t cost anything in terms of adding extra bits to the protocol. Even if the FN is corrupt, it is still possible to add one to the last FN value and hope that the fragment is not too corrupt. Without such a relationship between the FN and LICH fragment, you are adding extra processing to the M17 implementation which is entirely avoidable. Therefore I would strongly suggest having a fixed relationship between FN value and the LICH fragment number.
If going to a Golay(23,12,7), I would suggest using the free’d bits for a local checksum rather than a LICH fragment number.
Because the LICH is intended to be decoded without without the need to decode the payload. And the FN exists in the payload.
Assuming a relationship between LICH and FN will likely result in problems joining a transmission after the FN rolls over.
I opened a separate issue on GitHub about the current need to decode the LICH to compute the payload CRC. That has (or will) result in a revision to the spec. The converse of the above is also true: it is not necessary to decode the LICH if one has a valid LSF; one need only decode the payload.
While I agree with the latter, indeed the MMDVM rebuilds the fragment from the already decoded data, so it doesn’t care how badly corrupted the fragment once it has the information. It just adds to the BER value.
I disagree strongly about the FN value and LICH fragment. Digital Voice is not AX.25, in that the data comes it as a regular beat every 40ms, so as long as you have at least one valid FN, you can guess what the next FN values are reliably, until the modem declares the signal lost due to too much corruption of the sync vector. Therefore it makes a lot of sense to have a fixed relationship between FN and LICH fragment.
I see no good reason not to link the LICH fragment to the FN, it saves processing power, data copying, and to be honest is more professional in design, along with adding a checksum to the fragment LICH.
Fundamentally we want to avoid imposing the cost of convolutional decoding the payload when the only need is to decode the LICH.
To that end, I am proposing changing the LICH so that a superframe is made up of 6 40-byte segments instead of 5 48-byte segments. And 8 bits are added to each LICH that include a 3 bit counter [0…5] and 5 extra bits for signalling.
The Golay code will detect up to 7 bit errors out of 24 (up to 28 bit errors out of 96-bit encoded LICH). Why is this not sufficient? We have Golay code for each LICH fragment and a full CRC on the superframe. Adding yet another checksum seems a bit redundant.
If the BER is so high already, adding another checksum will not help.