Practical Finance and Data Analysis Lab Summary

This lab was basically undefined to start with, with simple goals of gathering knowledge on finance in general, learning where to find data, and using data analysis methods to glean information from the data. We came up with several grand ideas, but finally condensed our goals- the entire process can be seen in the lab notes.

Approach

In the end, we generated fixed-length time windows with random start dates over the period of price data acquired for the Dow Jones Industrial Average, then tried to find an average growth rate using both "snapshots" and least-squares linear fits. We defined snapshots as slopes between the start and end points in each time window, reflecting what a person would receive if he actually invested over exactly each window. Least-squares fits take into account fluctuations in the price between the start and end points, so that someone investing on a shorter interval would probably see results more akin to the least-squares fit of his specific time window. We compared the two methods over several different time windows, then analyzed the least-squares data to see if it fit any sort of distribution.

Final Results

Least-squares:

Least-squares Slope Data [$/week] ± σravg

Sample Window

10 Year Window

1) 2.4689 ± 0.0007

11) 2.6043 ± 0.0007

2) 3.1063 ± 0.0011

12) 3.2287 ± 0.0011

3) 2.5977 ± 0.0008

13) 2.1363 ± 0.0009

4) 2.4893 ± 0.0007

14) 2.8116 ± 0.0008

5) 2.8451 ± 0.0007

15) 2.3922 ± 0.0005

6) 2.1571 ± 0.0007

16) 2.7420 ± 0.0009

7) 3.0676 ± 0.0009

17) 3.1118 ± 0.0008

8) 3.4700 ± 0.0009

18) 2.4796 ± 0.0009

9) 2.1050 ± 0.0006

19) 2.3554 ± 0.0009

10) 2.2250 ± 0.0008

20) 2.0223 ± 0.0009

20 Year Window

1) 2.9810 ± 0.0003

11) 2.5314 ± 0.0003

2) 2.8751 ± 0.0004

12) 2.5233 ± 0.0003

3) 2.1529 ± 0.0003

13) 2.0260 ± 0.0003

4) 2.5026 ± 0.0002

14) 2.8666 ± 0.0003

5) 2.3646± 0.0002

15) 2.8161 ± 0.0003

6) 3.5018 ± 0.0004

16) 2.0481 ± 0.0003

7) 2.3696 ± 0.0002

17) 2.8723 ± 0.0003

8) 2.7557 ± 0.0003

18) 2.6072 ± 0.0003

9) 2.0702 ± 0.0003

19) 3.0362 ± 0.0004

10) 2.6607 ± 0.0003

20) 1.8809 ± 0.0003

30 Year Window

1) 1.6386 ± 0.0002

11) 1.7714 ± 0.0002

2) 1.6168 ± 0.0002

12) 1.9832 ± 0.0002

3) 1.9087 ± 0.0002

13) 1.7668 ± 0.0002

4) 1.4801 ± 0.0002

14) 1.2027 ± 0.0002

5) 1.8070 ± 0.0002

15) 2.0399± 0.0002

6) 2.3613 ± 0.0002

16) 1.9900 ± 0.0002

7) 1.9316 ± 0.0002

17) 2.1310 ± 0.0002

8) 2.3433 ± 0.0002

18) 2.3271 ± 0.0002

9) 2.1834 ± 0.0002

19) 2.1681 ± 0.0002

10) 1.6845 ± 0.0002

20) 1.9814 ± 0.0002

40 Year Window

1) 1.2491 ± 0.0002

11) 1.5066 ± 0.0002

2) 1.6391 ± 0.0002

12) 1.6332 ± 0.0002

3) 1.4006 ± 0.0002

13) 1.7675 ± 0.0002

4) 1.6526 ± 0.0001

14) 1.7078 ± 0.0002

5) 1.8456 ± 0.0002

15) 1.4713 ± 0.0002

6) 1.4773 ± 0.0002

16) 1.6195 ± 0.0002

7) 1.5594 ± 0.0001

17) 1.6892 ± 0.0002

8) 1.5667 ± 0.0002

18) 1.5924 ± 0.0002

9) 1.6973 ± 0.0001

19) 1.7595 ± 0.0002

10) 1.3066 ± 0.0002

20) 1.3776 ± 0.0002

50 Year Window

1) 1.5937 ± 0.0001

11) 1.4359 ± 0.0002

2) 1.3238 ± 0.0002

12) 1.4196 ± 0.0002

3) 1.5460 ± 0.0001

13) 1.5010 ± 0.0001

4) 1.4301 ± 0.0002

14) 1.6835 ± 0.0002

5) 1.4891 ± 0.0002

15) 1.4429 ± 0.0001

6) 1.3474 ± 0.0001

16) 1.4743 ± 0.0001

7) 1.6312 ± 0.0002

17) 1.4178 ± 0.0001

8) 1.4696 ± 0.0002

18) 1.5034 ± 0.0002

9) 1.5794 ± 0.0001

19) 1.4420 ± 0.0001

10) 1.3559 ± 0.0001

20) 1.3633 ± 0.0001

Window

Average Least-squares Slope after 100 iterations and 20 Trials ($/week)

10 Year

2.782 ± .0002

20 Year

2.442 ± .0001

30 Year

1.916, error < .0001

40 Year

1.576, error < .0001

50 year

1.472, error < .0001

Snapshot:

Snap Shot Window

Average Slope after 100 iterations and 20 Trials ($/week)

10 Year

2.997

20 Year

2.287

30 Year

2.002

40 Year

1.892

50 year

1.999

The snapshot and least-squares methods are roughly comparable, although not within error bars generated with least-squares fitting. I believe more iterations of the snapshot method would be necessary to make up for its lack of sensitivity of trends. This would in a sense be a "least-squares" analysis taking a much longer route.

Distribution

The least-squares data returned histograms with leftward-leaning "shoulders," which make sense given the preponderance of smaller slope in the total time period of the DJIA price:

With this in mind, fitting with known distributions seemed frivolous, but did provide elementary insight into the distribution fitting process in general. The log-likelihood rating given fits intrigues me in particular, but online resources are cryptic at best. Open science has not conquered this one just yet!

Thoughts

This lab basically turned into a large exercise on data manipulation and information-hunting. I also was able to combine concepts from lecture on linear least-squares fitting with more general error propagation.

The distribution analysis is gratifying in that I see a reflection in the processed data corresponding to trends in the raw data. Something to try regarding the distribution fit would be to generate a fake DJIA from the best-fit distribution function and compare with the real data. I imagine this would involve something like executing our lab process in reverse, generating huge numbers of linear windows with random start dates and distribution-derived slopes. One could then "smooth" all these disjointed, overlapping linear sections into a fake DJIA.