QuickBASIC is an out of place choice when compared to most other languages that I’ve written in this series. Why would I jump so far backwards to QuickBASIC?

There are actually an umber of reasons. The first is that QuickBASIC actually imposes a number of limitations. Aside from the more limited programming language compared to, say C#, it also means any solution needs to appropriately contend with issues such as Memory usage and Open File Handles on MS-DOS. At the same time, a lot of the development task is actually more simple; one doesn’t need to fiddle with designers, or property pages or configuration tabs, or anything of that sort. You open a text file and start writing the program.

The first task is to determine an algorithm. Of course, we know the Algorithm- it’s been described previously- However, in this instance, we don’t have hashmaps available; furthermore, even if we want to implement that ourself, we cannot even keep all the information in memory. As a result, one compromise is to instead keep an array of index information in memory; that array can contain the sorted word as well as a record index into another random-access file, so, to start, we have these two TYPE structures:

1

2

3

4

5

6

7

8

9

TYPE SORTINDEX

SORTED ASSTRING*28

OFFSET ASLONG

ENDTYPE

TYPE SORTRECORD

WORDCOUNT ASINTEGER

SORTWORDS(1TO20)ASSTRING*28

ENDTYPE

By writing and reading directly from a scratch file when we need to add a new file to the “hash” we can avoid having any of the SORTRECORD structures in memory except the one we are working with. This drastically reduces our memory usage. As did determining that the longest word in SORTINDEX is 28 characters/bytes. The algorithm thus becomes similar- basically, with a word, we sort the words letters, and then we consult the array of SORTINDEX types. If we find one with the sorted word, we take the OFFSET and we read in the SORTRECORD at that offset, increment wordcount, and add the word to the SORTWORDS array, then PUT it back into the scratch file. And if it isn’t found in the SORTINDEX, we create a new entry- saving a new record with the word to the scratch file and recording the offset and sorted text in the index for that record.

Of course this does have several inefficiencies that I won’t address; The first is that the search for the matching sorted word is effectively a sequential search. Ideally, the in-memory index would be kept sorted and searches could use a Binary Search. I guess if somebody is interested I “left it as an exercise for the reader”.

Otherwise all seems well. But not so fast- the dict.txt file has 45402 words. Our type definition is 32 bytes, which means for all words to be stored in the index, we would need 1,452,864 bytes, which is far beyond the conventional memory limits that we are under. So we need to drastically reduce the memory usage of our algorithm. And we had something so promising! Seems like it’s back to the drawing board.

Or is it? instead of trying to reduce how much our algorithm uses, we could reduce how much data it is working with. At a time. We can split the original dictionary file into chunks, and as it happens since words of different lengths cannot be anagrams of each other, we can merely split the file into separate file organized by length. Then we perform the earlier algorithm on each of those files and output the resulting anagram list of each to one file. That would give us one file listing all anagrams without exceeding memory limitations!

Before we get too excited, let’s make sure that the largest “chunk” would be small enough. using another QuickBASIC program (because, what the hell, right?) I checked over the count of files of particular lengths. In this case, the chunk with the most is words of 7 letters in length, of which there are 7371 in the test dictionary file. This would require 235,872 Bytes of storage, which is well within our 640K conventional memory limit.

Of course, there is a minor caveat; we do need to start QuickBASIC with certain command line arguments, as, by default, the dynamic array maximum is actually 64K. We do this by launching it with the /Ah command line parameter. Otherwise, we might find that it encounters Subscript out of range errors once we get beyond around the 2000 mark for our 32-byte index record type.

Another consideration I encountered was open files. I had it opening all the dictionary output files at once, but it maxed out at 16 files, so I had to refactor it to be much slower by reading a line, determining the file to open, writing the line, and them closing the file. Again, there may be a better technique here to increase performance. For reference, I wasn’t able to find how to increase the limit, either (adjusting config.sys didn’t help).

After that, it worked a treat- the primary algorithm runs on each length subset, and writes the results to an output file.

Without further Adieu- The Full source of this “solution”:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

DECLARESUB SPLITDICT()

DECLARESUB ADDWORD(WORDADD ASSTRING)

DECLAREFUNCTIONGETOFFSET%(SORTEDWORD ASSTRING)

DECLAREFUNCTIONGETBYTEOFFSET%(SORTEDWORD ASSTRING)

DECLAREFUNCTIONSORTCHARS$(ST ASSTRING)

TYPE SORTINDEX

SORTED ASSTRING*28

OFFSET ASLONG

ENDTYPE

TYPE SORTRECORD

WORDCOUNT ASINTEGER

SORTWORDS(1TO20)ASSTRING*28

ENDTYPE

DIMxASINTEGER

COMMON SHARED INDEXDATA()ASSORTINDEX

COMMON SHARED INDEXFILE ASSTRING

COMMON SHARED INDEXCOUNT ASINTEGER

COMMON SHARED INDEXHANDLE ASINTEGER

DIM USEFILE ASINTEGER

DIM READWORD ASSTRING

DIM READINDEX ASLONG

DIM ANAGRAMOUT ASINTEGER

SPLITDICT

ANAGRAMOUT=FREEFILE

OPEN"C:\ANAGRAMS.TXT"FOROUTPUT ASANAGRAMOUT

DIM LETTERCOUNT ASINTEGER

FORLETTERCOUNT=2TO32

INDEXFILE="C:\DICT.IDX"

INDEXHANDLE=FREEFILE

INDEXCOUNT=0

KILL INDEXFILE

OPEN INDEXFILE FORRANDOM ASINDEXHANDLE LEN=562

ERASE INDEXDATA

DIM DICTFILE ASSTRING

DICTFILE="C:\DICT\DICT"+RTRIM$(LTRIM$(STR$(LETTERCOUNT)))+".TXT"

USEFILE=FREEFILE

IFDIR$(DICTFILE)<>""THEN

OPEN DICTFILE FORINPUT ASUSEFILE

PRINT USEFILE

PRINT INDEXHANDLE

'each line is a word. However, as we have limited memory

'compared to,C# and other languages we need to come up with a clever workaround.

'that workaround? we use the file system. Instead of using a hashmap, we instead create a

'C:\DICT\DICT7.TXT would contain the words that have7letters,forexample.

DIM FILENAMES(2TO28)ASSTRING

DIM TOTALCOUNT(2TO28)ASLONG

DIM LCOUNT ASINTEGER

DIM USEOUTFILE ASSTRING

DIM DICTREADER ASINTEGER

DIM LINEREAD ASSTRING

DIM CHECKLEN ASINTEGER

DIM FSORT ASINTEGER

FORLCOUNT=2TO28

USEOUTFILE="C:\DICT\DICT"+RTRIM$(LTRIM$(STR$(LCOUNT)))+".TXT"

FILENAMES(LCOUNT)=USEOUTFILE

FSORT=FREEFILE

OPEN USEOUTFILE FOROUTPUT AS#FSORT

CLOSE#FSORT

NEXT LCOUNT

DICTREADER=FREEFILE

OPEN"C:\DICT\DICT.TXT"FORINPUT AS#DICTREADER

PRINT"Reading Dictionary File Data..."

DOWHILENOTEOF(DICTREADER)

LINE INPUT#DICTREADER, LINEREAD

CHECKLEN=LEN(LINEREAD)

TOTALCOUNT(CHECKLEN)=TOTALCOUNT(CHECKLEN)+1

PRINT LINEREAD

FSORT=FREEFILE

OPEN FILENAMES(CHECKLEN)FORAPPEND ASFSORT

PRINT#FSORT, LINEREAD

CLOSE#FSORT

LOOP

ENDSUB

And there you have it. an Anagram search program written in QuickBASIC. Of course, it is arather basic and is a bit picky about preconditions (hard-coded for a specific file, for example) but it was largely written against my test VM.