The other day we were writing sentences with my daughter with a fridge magnet letter. While we were able to make some(I love cat), we didn't have enough letters to make the others (I love you too) due to an insufficient amount of letters o (4)

Problem

Given the text file where each line contains a "sample sentence" one would want to write on the fridge, propose an alphabet set with minimum amount of letters but still sufficient to write each sentence individually.

Note: ignore cases, all magnet letters are capitals anyway.

Input

The file contain newline separated sentences:

hello
i love cat
i love dog
i love mommy
mommy loves daddy

Output

Provide back sorted list of letters, where each letter appears only as many times to be sufficient to write any sentence:

There should be a letter v in the output ;)
–
Antonio RagagninApr 10 '14 at 6:35

32

Are we allowed / required to substitute an upside-down M for a W, or a sideways N for a Z? ;-)
–
Ilmari KaronenApr 10 '14 at 11:42

4

Basically you can construct any letter using Is.
–
swishApr 10 '14 at 11:53

5

More seriously, when you say "ignore cases", do you mean that we can assume that the input is already all in the same case, or that we must convert it all into the same case? Also, is it OK for the output to include some leading spaces?
–
Ilmari KaronenApr 10 '14 at 12:01

41 Answers
41

GolfScript, 28 / 34 chars

n/:a{|}*{a{.[2$]--}%*$-1=}%$

The 28-character program above assumes that all the input letters are in the same case. If this is not necessarily so, we can force them into upper case by prepending {95&}% to the code, for a total of 34 chars:

{95&}%n/:a{|}*{a{.[2$]--}%*$-1=}%$

Notes:

For correct operation, the input must include at least one newline. This will be true for normal text files with newlines at the end of each line, but might not be true if the input consists of just one line with no trailing newline. This could be fixed at the cost of two extra chars, by prepending n+ to the code.

The uppercasing used in the 34-character version is really crude — it maps lowercase ASCII letters to their uppercase equivalents (and spaces to NULs), but makes a complete mess of numbers and most punctuation. I'm assuming that the input will not include any such characters.

The 28-character version treats all input characters (except newlines and NULs) equally. In particular, if the input contains any spaces, some will also appear in the output; conveniently, they will sort before any other printable ASCII characters. The 34-character version, however, does ignore spaces (because it turns out I can do that without it costing me any extra chars).

Explanation:

The optional {95&}% prefix uppercases the input by zeroing out the sixth bit of the ASCII code of each input byte (95 = 64 + 31 = 10111112). This maps lowercase ASCII letters to uppercase, spaces to null bytes, and leaves newlines unchanged.

n/ splits the input at newlines, and :a assigns the resulting array into the variable a. Then {|}* computes the set union of the strings in the array, which (assuming that the array has at least two elements) yields a string containing all the unique (non-newline) characters in the input.

The following { }% loop then iterates over each of these unique characters. Inside the loop body, the inner loop a{.[2$]--}% iterates over the strings in the array a, removing from each string all characters not equal to the one the outer loop is iterating over.

The inner loop leaves the ASCII code of the current character on the stack, below the filtered array. We make use of this by repeating the filtered array as many times as indicated by the ASCII code (*) before sorting it ($) and taking the last element (-1=). In effect, this yields the longest string in the filtered array (as they all consist of repeats of the same character, lexicographic sorting just sorts them by length), except if the character has ASCII code zero, in which case it yields nothing.

J - 37 char

Reads from stdin, outputs to console.

dlb#&a.>./+/"2=/&a.tolower;._2[1!:1]3

1!:1]3 is the call to stdin. tolower;._2 performs double duty by splitting up the lines and making them lowercase simultaneously. Then we count how many times a character occurs in each row with +/"2=/&a., and take the pointwise maximum over all lines with >./.

Finally, we pull that many of each character out of the alphabet with #&a.. This includes spaces—all found at the front due to their low ASCII value—so we just delete leading blanks with dlb.

We can ignore the case of the input (as specified by the question - i.e. it is all in either upper or lower case);

The output is an array of characters (which is about as close as JavaScript can get to the OP's requirement of a list of characters); and

The output is to be displayed on the console.

With comments:

var l = s.split('\n') // split the input up into sentences
.map(x=>x.split(/ */) // split each sentence up into letters ignoring any
// whitespace
.sort() // sort the letters in each sentence alphabetically
.map((x,i,a)=>x+(a[i-1]==x?++j:j=0)))
// append the frequency of previously occurring identical
// letters in the same sentence to each letter.
// I.e. "HELLO WORLD" =>
// ["D0","E0","H0","L0","L1","L2","O0","O1","R0","W0"]
[].concat(...l) // Flatten the array of arrays of letters+frequencies
// into a single array.
.sort() // Sort all the letters and appended frequencies
// alphabetically.
.filter((x,i,a)=>a[i-1]!=x) // Remove duplicates and return the sorted
.map(x=>x[0]) // Get the first letter of each entry (removing the
// frequencies) and return the array.

By assuming f for the input filename and using uppercase (all magnet letters are uppercase anyway), you can get it down to 91: print(''.join([chr(i)*max(l.upper().count(chr(i))for l in open(f))for i in range(65,91)]))
–
GabeApr 10 '14 at 16:30

1

@njzk2 well, if we run this in the console, in theory it would just print the result by itself...
–
TalApr 11 '14 at 15:54

Perl 6: 5653 characters; 5855 bytes

say |sort
([∪] lines.map:{bag comb /\S/,.lc}).pick(*)

For each line, this combs through it for the non-space characters of the lower-cased string (comb /\S/,.lc), and makes a Bag, or a collection of each character and how many times it occurs. [∪] takes the union of the Bags over all the lines, which gets the max number of times the character occurred. .pick(*) is hack-y here, but it's the shortest way to get all the characters from the Bag replicated by the number of times it occurred.

EDIT: To see if it would be shorter, I tried translating histocrat's Ruby answer. It is 63 characters, but I still very much like the approach:

Basically I'm appending the whole alphabet to each line, so that when grouping and sorting, I'm sure I'll end up with a list that contains 27 elements. Next, I transpose the "frequency table", so that each row in this array consists of the frequencies of a single letter in each line, e.g. ["a","","aaa","aa","aaaa"]. I then choose the maximum of each array (which works just like I want because of how the Ord-instance of Strings work), and drop the letter that I appended at the start, get rid of the spaces, and output the result.

Sample output

Explanation

Create files that we will be reading from later on so that bash doesn't complain that they don't exist. If you remove this line you will save 13 chars but get a lot of junk output.

split _ -1

Split the input file into sections, each storing 1 line. The files this command creates are named xaa, xab, xac and so on, I have no idea why.

for l in {a..z}
do for s in {a..z}

For each letter $l read through all lines stored in files xa$s.

do grep -so $l xa$s>b$l

Remove the -s switch to save 1 char and get a lot of junk output. It prevents grep from complaining about nonexistent files (will occur unless you have 26 lines of input).
This processes the file xa$s, removing anything but occurences of $l, and sending output to the file b$l. So "i love mommy" becomes "mmm" with new lines after each letter when $l is m.

if [ `wc -l<b$l` -ge `wc -l<$l` ]

If the number of lines in the file we just created is greater than or equal to (i.e. more letters since there is one letter per line) the number of lines in our highest result so far (stored in $l)...

then mv b$l $l

...store our new record in the file $l. At the end of this loop, when we have gone through all the lines, the file $l will store x lines each containing the letter $l, where x is the highest number of occurences of that letter in a single line.

fi
done
tr -d '\n'<$l

Output the contents of our file for that particular letter, removing new lines. If you don't want to remove the new lines, change the line with tr to echo $l, saving 6 chars.

Tried with GNU bash, version 3.2.51 (apple), but file '-l1aa' in a current folder containing input data..
–
romaninshApr 21 '14 at 23:33

@romaninsh It might be that you have a different version of split (from coreutils). I am currently running GNU bash 4.3.8 and GNU coreutils 8.21 on Ubuntu 14.04 and it works fine (it also worked on Ubuntu 13.10 before I upgraded). However, I did have to place the program and the input file in a separate directory for it to work properly - I suspect this was only because of the millions of junk files in my home folder.
–
professorfishApr 22 '14 at 7:03

@romaninsh in fact, if you look at the exact command in the script: split _ -l1 and you notice that your input is being saved to -l1aa, I think that your version of splitisn't recognising -l1 as an option and instead taking it to be a prefix for output. Try putting a space between -l and 1, or putting --lines=1, or just -1 (this appears to be an obsolete and more golfy syntax which I will now update the post with).
–
professorfishApr 22 '14 at 7:06

kdb (q/k): 59 characters:

d:.Q.a! 26#0
.z.pi:{d|:.Q.a##:'=_y}.z.exit:{-1@,/.:[d]#'!:d}

generate pre-sorted seed dictionary from alphabet .Q.a

process each line of input, convert to lowercase, group into dictionary, count each element, take alphabetic characters from result (I.e. prune spaces, newlines, etc at this stage) and use max-assign to global d to keep a running total.

define exit handler, which gets passed in to .z.pi to save a delimiter but otherwise unused there. Take from each key-value to generate list of characters, flatten and finally print to stdout.

-1 adds a newline, using 1 would save a character but does not generate the output specified. Wish I could get rid of the .z.pi / .z.exit boilerplate, which would remove 14 characters.

Python 2 - 129

a,r=[0]*26,range(26)
for l in open('f'):a=[max(a[i],l.lower().count(chr(i+97)))for i in r]
print''.join(chr(i+97)*a[i]for i in r)

A couple more ways to do the same thing in the same number of characters:

a=[0]*26
b='(chr(i+97)))for i in range(26)'
exec'for l in open("f"):a=[max(a[i],l.lower().count'+b+']\nprint"".join(a[i]*('+b+')'
a=[0]*26
b='(chr(i+97)))for i in range(26))'
exec'for l in open("f"):a=list(max(a[i],l.lower().count'+b+'\nprint"".join(a[i]*('+b

This assumes the file is saved as f in an accessible directory. This program is directly runable, with no extra input necessary.

Scala, 125 characters

First I read the input, converting it into lower case and adding one empty line.

Then for each letter from a to z I repeat that letter maximum number of times it appears in any of the lines (that's why I need the empty line: max cannot be called on an enpty input). Then I just join the results and print to the output.

To read from a file, replace stdin with fromFile("FILENAME"), increasing the size of the code to 132 characters + file name length.

Remove the eval(...) and execute to get the real code; this is (somewhat) compressed.

s multi-functions as the array of lines and as the outputted string, h contains the histogram of the letters per line and H contains the histogram with the maximum values up until now. It's case-insensitive, and just ignores anything but a-z and A-Z (I think... JS arrays are sometimes weird).

This just totals the characters, not quite what the question asked. The letters should be totalled to be the bare minimum set to form any single sentence in the input, not all of them. I quite like your approach to prevent the need to sort the output though.
–
MattApr 10 '14 at 12:11

does not answer the question, as you need the minimum amount of letters to write each sentence individually. In your code, you output the number of letters needed to write all sentences at the same time.
–
njzk2Apr 11 '14 at 14:35

PHP - 143

Explanation

For each possible letter I'm mapping array containing list of strings through a user-defined function which replaces each line with number of characters used. For letter 'd' the line "Mommy loves daddy" will be mapped into 3.

Afterwards I find maximum value inside array and output letter just this many times. Here is multi-line version: