1. Create the source data

We’ll use a few Canadian and US province/state population stats for the input data set.

Java

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

privatestaticRecordList getCanadianProvinces(){

RecordList list=newRecordList();

list.add(newRecord()

.setField("Province","Ontario")

.setField("Population",12851821)

.setField("Population Growth",5.7));

list.add(newRecord()

.setField("Province","Quebec")

.setField("Population",7903001)

.setField("Population Growth",4.7));

list.add(newRecord()

.setField("Province","British Columbia")

.setField("Population",4400057)

.setField("Population Growth",13.1));

list.add(newRecord()

.setField("Province","Alberta")

.setField("Population",3645257)

.setField("Population Growth",10.9));

list.add(newRecord()

.setField("Province","Manitoba")

.setField("Population",1208268)

.setField("Population Growth",3.6));

returnlist;

}

privatestaticRecordList getUsStates(){

RecordList list=newRecordList();

list.add(newRecord()

.setField("State","California")

.setField("Population",38802500)

.setField("Population Growth",9.99));

list.add(newRecord()

.setField("State","Texas")

.setField("Population",26956958)

.setField("Population Growth",20.59));

list.add(newRecord()

.setField("State","Florida")

.setField("Population",19893297)

.setField("Population Growth",17.64));

list.add(newRecord()

.setField("State","New York")

.setField("Population",19746227)

.setField("Population Growth",2.12));

list.add(newRecord()

.setField("State","Illinois")

.setField("Population",12880580)

.setField("Population Growth",3.31));

returnlist;

}

2. Generate the Excel file

When working with Excel files, we use an in-memory ExcelDocument object to temporarily hold the data before saving to disk. If you can write to a CSV file instead of Excel, you’ll be able to significantly reduce your memory usage down to the size of a single record. You’ll also be able to improve performance and stream your data in real-time without waiting for it to be collected (or batched) in memory.

This example uses MemoryReader to retrieve the province/state data and transfer it to specific worksheets in the excel document. It then saves the Excel workbook with all the sheets to disk.

All Data Pipeline readers, writers, transformers, filters, and converters extend either DataReader or DataWriter. This lets you replace any endpoint with another — like swapping Excel for JDBC. It also lets you add operators on top of endpoints and on top of other operators.