Break the NSA Encryption
Challenge on Gist from this Facebook Post provide an interesting question on cracking the NSA encryption. Doesn’t this hook you enough? Let’s roll. :D
Note, you can also test the validity of logic in this file with the command:
$ python3 -m doctest this-file.md --verbose
Preamble
This section is just a math function/constant defined for easy to use later.
Even though, you should not proceed until you know how it works? Why it necessary?
>>> from collections import Counter
>>> from itertools import product as cartesian_product
>>> def singleton(f):
... return f()
>>> @singleton
... def primes(bound=10**6):
... sieve = [True] * bound
... sieve[0] = sieve[1] = False
... cur = 0
... while cur < bound:
... if sieve[cur]:
... for idx in range(2*cur, bound, cur):
... sieve[idx] = False
... cur += 1
... return [idx for idx, is_prime in enumerate(sieve) if is_prime]
>>> def factorized(number):
... if number == 1:
... yield 1
... for prime in primes:
... if prime**2 > number:
... break
... while number % prime == 0:
... yield prime
... number //= prime
... if number > 1:
... if prime**2 > number:
... yield number
... else:
... raise ValueError('Number too big to be factorized.')
>>> def product(numbers, initial=1):
... for number in numbers:
... initial *= number
... return initial
>>> def divisors(factors):
... factor_set = ({p**i for i in range(k+1)} for p, k in Counter(factors).items())
... return sorted(product(c) for c in cartesian_product(*factor_set))
Introduction
Let There Be Let
- Message
is a cipher text that we (sneaky) see it (naughty you :P). - Message
is a plain text that we want to find. is a encrypt function that maps , in other word . is a decrypt function that works in reversed way of . i.e., . is a key to make function and stronger. It can be use in addition to those function as .
What We Already Known
Message
>>> C = 434384569820012709749978085023147407174684824178941182826833799495931410023794845922767533429746537016995520506439457550763575993604402054742042654701475990703513534158579743446096171193503041071008550601683001839024513922875537448251544812606790879783442587227277601837740236112564627038091170167413493638174350319534049389495771259593223970367601200671312435588244337256711215093040169407379877379400242759675821842258110296156050808143302683071999772732555638568114446076107646435540182476596386155629693774180289540430199704239923354089531930534807431109632462125230336414045829798557815327441670222304200
We precisely know how
Where
>>> def E(P, K):
... C = 0
... for p in P:
... C += ord(p)
... C *= K
... return C
From here, we can easily see that the decrypt algorithm is:
>>> def D(C, K):
... P = ''
... while C:
... C //= K
... P = chr(C%K) + P
... C -= C % K
... return P
Speculation
Key range
Assume original message
Since
Approximate result from this roughly calculation is:
Note: this is not the final answer, just compute it to see how large the number can be.
Assume original message 0-9
, a-f
, and A-F
. These translated with a-f
, each
Assume every char in 0
, its
>>> Ku = 9280 * 10**15
Assume every char in f
, its
>>> Kl = 9060 * 10**15
So basically, if you select a key
Possible Keys
Take a look at the encrypt function, we see that:
This may be the weakest link of the function that allow us to attack. It tell us that
We need prime number ups to the largest possible
>>> it = factorized(C)
>>> factors = []
>>> try:
... while True:
... factors += [next(it)]
... except ValueError:
... pass
>>> factors
[2, 2, 2, 3, 5, 5, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 41, 106903]
The remaining number may be a large final prime, or may be it can be factorized further. We can ignore this number, as long as every factors from it is larger than
1198162896844627574951254304167077865137441451269873420210823707907632078858692758046877232069029631575824857410083756805246859463940310819141386606731750942905572044682630074951510603667141665966096621398351556391207139334515227387293097732317039812158636295421558919175442502050687175932172864893273289106253699350521598092326361993640396143041026264695563620680179489564427544445897926837177366506825782997292755798068057150593688470132473887043210442755344189289139842977367860902098509084703400379089529999185583555055563981243234518434795150565878268478914709009755243001770283983741
Next is to find all of the divisors of
is a divisor of , since is one of its factors. is a divisor of , since occur three times in the factors. is also a divisor of , same idea as above.
Then we will filter only divisors that falls in the key range.
>>> Ks = [k for k in divisors(factors) if Kl < k < Ku]
>>> Ks
[9063554107792192905]
Luckily enough, only one possible key shows up here! So now we’ll take that key as the key to the answer. (If this key failed, then more brain power to be put in this challenge :p)
Result and Discussion
Unfold to Plain Text
Now we have a key
>>> K = Ks[0]
>>> D(C, K)
'e0d00b9f337d357c6faa2f8ceae4a60d'
Looks like MD5 hash.
The Real Answer
Try hashing every words in your English dictionary to see which word maps to that hash, if you want to have some more fun (and make you feel like you just conquer the hacker’s world). Otherwise just ask rainbow table anywhere on the internet.
Spoiler alert!
>>> from hashlib import md5
>>> md5('cryptography'.encode()).hexdigest() == D(C, K)
True
Summary
By best practice, the key must be large prime in order to be uncrackable. However, we’re very lucky that 1.) the key is a composite number, 2.) we know the length of the original plain text 3.) the character space used in plain text is very narrow. These flaws allow us to break the code in no time, though some aspect of the methodology may looks weird and unscalable. Thanks for a good puzzle. ;)
Originally published on: Gist

author