I'm writing a program that decrypts text encrypted using a Vigenere cipher. It's been working very well so far, but my current issue is that I need an effective way of checking if a string is decrypted. So far, I've been checking how many times the string contains a set of the most common English words, but that doesn't work if the string has no spaces.

For example, the following "plaintext":

XPAWWALLTPJZYYZWBGSHGARECPVHDAAPLJLBGAPVTFVCWAS

contains "all", "are", "was", and "as", but it is clearly not decrypted.

$\begingroup$Do you know about the index o coincidence. For a given string, the index can distinguish between a random string and a string in a certain language.$\endgroup$
– corpsfiniMar 25 at 18:51

$\begingroup$It is not correct that Stackoverflow sent this question to this site. The one who decided that did not understand probably the question and looked at the word decrypted only, where as actually this question has nothing to do with cryptography. This is pure programming and algorithmic question. We should not hesitate to move the question to Stackoverflow.$\endgroup$
– mentallurgMar 25 at 23:02

$\begingroup$@corpsfini Thanks a lot, using the index of coincidence has actually worked really well for me. This is kind of a separate question, but do you have any idea of how I could check if a string is English when it has only been rearranged (e.g. columnar transposition)? The index of coincidence for rearranged strings is the same as it is in the plaintext, unfortunately.$\endgroup$
– paper manMar 27 at 17:20

$\begingroup$My main issue is that this strategy only works when the string as spaces in it. The string I put in my main question has several words in it, but it is clearly not decrypted, so I was hoping you would have a method that doesn't involve searching for the most common words.$\endgroup$
– paper manMar 25 at 22:05

$\begingroup$Of course, you could do this with a list of maybe the 1000 most common words, but then the algorithm would be very slow.$\endgroup$
– paper manMar 25 at 22:06

This is purely an algorithmic question --- given some string $s := s_1s_2\dots s_n$ (where $s_i\in\Sigma$, some alphabet) and some collection of words $W_1,\dots, W_j \in \Sigma^*$, one wants to parse $s$ into the concatenation of some number of contiguous words.

This has a natural dynamic programming solution, which would likely be quite efficient.
The main issue would be the efficiency of checking if some string is in your collection of words, but I imagine there are dictionary APIs that allow that to happen efficiently.

Concretely, the dynamic programming solution builds an array $T$, where $T[i]$ will somehow (to be described later) encode all valid parses of $s_i\dots s_n$ into English words.
Note that if one has $T[j]$ filled out for all $j > i$, it should be easy to compute $T[i]$ itself ---- check if $s_i\dots s_{j-1}$ is a valid word for all $j > i$.
Whenever it is, add $s_i\dots s_{j-1} + T[j]$ to $T[i]$.
By this, I mean set $T[i] = \cup_{w\in T[j]} s_i \dots s_{j-1} + w$ where $w$ is the sentence which has been parsed so far, and $s_i\dots s_{j-1} + w$ denotes concatenation (separated by a space).

This will lead to $T[1]$ containing all possible parses of your sentence into English words, which would be useful if you want to check if they are grammatically correct (for example).
It may be that this is too much information though --- maybe you want to ignore grammar entirely, and only care if some such parse exists.
One can save on memory then by instead only storing within $T[i]$ the indices $j$ such that $s_i\dots s_{j-1}$ is a valid word.
One can then check if a valid parse exists by seeing if there is a path from $T[1]$ to $T[n]$ in the graph with vertices $\{1,\dots, n\}$, and edges from $i\to j$ whenever $j\in T[i]$.
Such a path $1\to i_1\to i_2\to\dots \to i_k \to n$ can be interpreted as breaking $s$ into words $s_1\dots s_{i_1-1}, s_{i_1}\dots s_{i_2-1},\dots, s_{i_k}\dots s_n$, which are all valid words by the construction of $T$.

In this reduced memory version of the problem, filling out each $T[i]$ takes $O(nA)$ time, where $A$ is the amount of time it takes to check if a string is in your collection of words.
Filling out the whole table then takes $O(n^2A)$ time.
Finding if a path exists from $T[1]$ to $T[n]$ can then be done in a variety of ways --- note here that the path does not even need to be the shortest path, which makes the problem much easier (and one can just use BFS/DFS, adding at most $O(n^2)$ to the problem).