Due to firewall constraints with my work laptop, I'm looking for a personal laptop to sandbox whatever programming language or framework that peeks my interest. I'll probably be working with Hadoop and Python (data analysis) so I'm thinking I need a minimum of maybe 16 GB RAM and 1 TB Hard drive (SSD?) Is this overkill for a Hobby computer?

If price is not an issue, I'd go with a macbook pro. It's unix-based, so that's nice. I've never had a laptop that holds up nearly as well. The only problem is the dust in the keyboard and maybe the function keys. But, overall, I like it a lot.

I‘d either go MacBook pro or maybe a Linux build by system76 or similar.

Hadoop? You should read the requirements - it writes temporary files so maybe you want more than a 1TB SSD.

As for memory, I’d want 32GB minimum myself and regret going cheap on 16GB. Well Apple had limited options until recently ...

Thanks. I've worked with it before, I would use the commercial version (Cloudera/HortonWorks/MapR) in conjunction with AWS or AZURE, so the hardware space on the laptop might not be needed. Never know though, as I want to remain in the 'free' zone for cloud services.

If price is not an issue, I'd go with a macbook pro. It's unix-based, so that's nice. I've never had a laptop that holds up nearly as well. The only problem is the dust in the keyboard and maybe the function keys. But, overall, I like it a lot.

What specs did you go with the Mac Pro? 8 or 16 RAM? 256 GB SSD? Seems the middle ground.

If price is not an issue, I'd go with a macbook pro. It's unix-based, so that's nice. I've never had a laptop that holds up nearly as well. The only problem is the dust in the keyboard and maybe the function keys. But, overall, I like it a lot.

Laptops not only tend to be less robust and harder to repair but they are also a lot more expensive than a desktop.

Unless you have some specific need for a laptop I would suggest a desktop with a large monitor with plenty of expansion capability since you might find you need more memory, graphics cards, or drives.

You will get a lot more bang for the buck that way and have less chance of it not being able to do something that you need in the future.

Pay attention to the case size and specifications. Some desktops now are in small cases that have little expansion capability.

Dell and Lenovo sell refurbished laptops on their web sites and you can often get good deale there.

If price is not an issue, I'd go with a macbook pro. It's unix-based, so that's nice. I've never had a laptop that holds up nearly as well. The only problem is the dust in the keyboard and maybe the function keys. But, overall, I like it a lot.

Laptops not only tend to be less robust and harder to repair but they are also a lot more expensive than a desktop.

Unless you have some specific need for a laptop I would suggest a desktop with a large monitor with plenty of expansion capability since you might find you need more memory, graphics cards, or drives.

You will get a lot more bang for the buck that way and have less chance of it not being able to do something that you need in the future.

Pay attention to the case size and specifications. Some desktops now are in small cases that have little expansion capability.

Dell and Lenovo sell refurbished laptops on their web sites and you can often get good deale there.

Thanks. I'm not opposed to a desktop as I already have the monitors and space where I work from home.

While I certainly can learn how to expand the hardware, do you have a good starting point on specs to where that initially doesn't need to be considered.

While I certainly can learn how to expand the hardware, do you have a good starting point on specs to where that initially doesn't need to be considered.

You can watch a Youtube video to see how to do things like add memory or a hard drive so that is not hard to do.

I am retired and always worked on large systems so my knowledge is out of date. As a general rule I would look at the software you are using to see how much memory they say is needed and then at least double the memory. If you have open memory slots you can just add memory more memory if you need it.

The prices on the solid state drives have dropped a lot but some manufactures have still charge a lot for systems with them. I would probably just get a system with a 500gb+ conventional drive and try it out then add a 1TB SSD if you need it.

What size data sets do you intend to work with? A couple years ago, I got a ~$500 laptop with a 128 GB SSD for a computer science master's program. In one of my first classes, we were working with a 50 GB data set in Spark and I was simply running out of space. The problem was that in addition to storing the original data set, I also had to write intermediate results of the same size. So I'd look for a disk about 5x the size of your largest data set. And SSD does make an enormous difference over HDD - like an order of magnitude or two faster, which might turn an overnight job into a 20-minute job.

Another option is use a hosted environment for your large datasets, something metered like AWS if you're going to spin up services and shut them down, or a dedicated system like linode.com. Then just use your laptop for testing small test datasets.

Assuming you're looking for horsepower and value over superb portability, I would suggest (specifically for a developer workstation) considering the Lenovo P-series Thinkpads. (P52 for new and shiny, or a refurbished P51 for cheaper). DIY-upgradeable to multiple SSDs, 64GB+ RAM, and they have nice nVidia Quadro cards if you want to do ML or anything else that can take advantage of CUDA.

The ability to use an external GPU for number crunching might be interesting. In particular, you could tinker with the processor and internal GPU, and if and when you have a need to scale up, you could get an external one. In this scenario I think Apple is at a disadvantage because even the external GPU options are generally not NVidia and therefore not CUDA compatible. Even better, you could sell yourself on the ability to upgrade, and then in true boglehead fashion, find that you don't really need to upgrade for your hobby coding, and save some money by not actually buying the external GPU.

Due to firewall constraints with my work laptop, I'm looking for a personal laptop to sandbox whatever programming language or framework that peeks my interest. I'll probably be working with Hadoop and Python (data analysis) so I'm thinking I need a minimum of maybe 16 GB RAM and 1 TB Hard drive (SSD?) Is this overkill for a Hobby computer?

What type of laptop are you using and what are the specs?

Thank you kindly.

Same issues with work laptop so I use a ThinkPad T470 as my personal laptop. Bought it refurbished for < $500. It's a powerhouse and built like a tank. Highly recommend it.

I can't think of anything more luxurious than owning my time. - remomnyc

Your hardware will go much further if you learn to run vanilla Hadoop, etc. Plus, doing so will teach you more about internals worth knowing about. Most of the vendor addons from the commercial distros are what eat your CPU/RAM even when you aren’t running any jobs.

If price is not an issue, I'd go with a macbook pro. It's unix-based, so that's nice. I've never had a laptop that holds up nearly as well. The only problem is the dust in the keyboard and maybe the function keys. But, overall, I like it a lot.

Due to firewall constraints with my work laptop, I'm looking for a personal laptop to sandbox whatever programming language or framework that peeks my interest. I'll probably be working with Hadoop and Python (data analysis) so I'm thinking I need a minimum of maybe 16 GB RAM and 1 TB Hard drive (SSD?) Is this overkill for a Hobby computer?

What type of laptop are you using and what are the specs?

Thank you kindly.

No it is not overkill. 16 gb Is the minimum for a developer machine. More ram might be good.

The ability to use an external GPU for number crunching might be interesting. In particular, you could tinker with the processor and internal GPU, and if and when you have a need to scale up, you could get an external one. In this scenario I think Apple is at a disadvantage because even the external GPU options are generally not NVidia and therefore not CUDA compatible. Even better, you could sell yourself on the ability to upgrade, and then in true boglehead fashion, find that you don't really need to upgrade for your hobby coding, and save some money by not actually buying the external GPU.

If you need more GPU power than your internal system can provide, you are probably better off running your code in an AWS instance. Renting some crazy powerful thing for the couple hours you need is cheaper than buying it and have it sit around for the other 23. For hobby projects I have found it to work well. YMMV

In theory yes, in practice no. Hadoop doesn’t have a steep learning curve, and most of the more modern tech is inspired by original components in the Hadoop ecosystem to some degree. It’s not a bad introduction to distributed systems even if you aren’t going to use the classic MR execution engine, YARN, etc. The more things change, the more they stay the same.

What size data sets do you intend to work with? A couple years ago, I got a ~$500 laptop with a 128 GB SSD for a computer science master's program. In one of my first classes, we were working with a 50 GB data set in Spark and I was simply running out of space. The problem was that in addition to storing the original data set, I also had to write intermediate results of the same size. So I'd look for a disk about 5x the size of your largest data set. And SSD does make an enormous difference over HDD - like an order of magnitude or two faster, which might turn an overnight job into a 20-minute job.

Another option is use a hosted environment for your large datasets, something metered like AWS if you're going to spin up services and shut them down, or a dedicated system like linode.com. Then just use your laptop for testing small test datasets.

I will most likely use AWS/AZURE for larger datasets. Anything else I'll hold on a local database.

Assuming you're looking for horsepower and value over superb portability, I would suggest (specifically for a developer workstation) considering the Lenovo P-series Thinkpads. (P52 for new and shiny, or a refurbished P51 for cheaper). DIY-upgradeable to multiple SSDs, 64GB+ RAM, and they have nice nVidia Quadro cards if you want to do ML or anything else that can take advantage of CUDA.

If you are trying to boost your marketable skills, Snowflake and Google cloud platform are the future

And yes MacBook Pro is a solid option. Lot of my colleagues and myself who do this sort of work are on MacBook pros. Anything with at least 16gb of ram and solid state drive should be fine though

Thank you for the info. We essentially have too much data (5TB with a growth rate of 100 GB a month) and the old way of users accessing data through convention asp.net(UI) with a SQL Server DB doesn't cut it anymore. None of this is cloud based, so I'm trying to bring cloud and a distributed file system. Issue is, someone got sales pitched by Cloudera and I could be forced to use their commercial distribution. I'm excited by it either way but will certainly will look at Snowflake/Google Cloud.

I issue my developers the Dell Precision 7530 Mobile Worstation (and the 7520 before it). Our configuration is a Xeon E3 processor (forget the clock), 32GB of ECC (fills 2/4 slots, so we can go to 64GB easily), 4GB Nvidia card, and a 512GB SSD. None of our devs work with massive datasets locally, but if they did there's room for 2 additional drives if memory serves.

They're heavy, but the devs seem to like them, and they've been very reliable (I have about two dozen at this point). I tried to pitch the much sleeker 55x0, but they almost unanimously chose the larger 75x0 primarily due to the number pad.

Last edited by lazydavid on Sat Sep 15, 2018 6:04 pm, edited 1 time in total.

Agree with the sentiment that you can get so much more for the money in a desktop versus a laptop. I shop at the following Dell refurb website which is different from this other one. Fantastic deals abound, and I suggest signing up for eCoupons. Just got a coupon for 40% off. Use SAVE40NOW as coupon (ends Monday).

As a developer/ architect have have done this twice.
Based on what I have experienced is that there comes phases when I need to learn or poc out few things. But never needed a top of the line high ram multi core machines.
I have found it easier and better for me to use 1 year of see in cloud. Followed by Google cloud platform.

I make sure that I use my resources diligently and stop the machines when I am done.
Never spent more than 35 bucks in total yet.
Also gives advantage of learning and trying out other things as well.

Why don't you give it a try by getting something moderate and try cloud for few months?

What size data sets do you intend to work with? A couple years ago, I got a ~$500 laptop with a 128 GB SSD for a computer science master's program. In one of my first classes, we were working with a 50 GB data set in Spark and I was simply running out of space. The problem was that in addition to storing the original data set, I also had to write intermediate results of the same size. So I'd look for a disk about 5x the size of your largest data set. And SSD does make an enormous difference over HDD - like an order of magnitude or two faster, which might turn an overnight job into a 20-minute job.

Another option is use a hosted environment for your large datasets, something metered like AWS if you're going to spin up services and shut them down, or a dedicated system like linode.com. Then just use your laptop for testing small test datasets.

I will most likely use AWS/AZURE for larger datasets. Anything else I'll hold on a local database.

Assuming you're looking for horsepower and value over superb portability, I would suggest (specifically for a developer workstation) considering the Lenovo P-series Thinkpads. (P52 for new and shiny, or a refurbished P51 for cheaper). DIY-upgradeable to multiple SSDs, 64GB+ RAM, and they have nice nVidia Quadro cards if you want to do ML or anything else that can take advantage of CUDA.

If you are trying to boost your marketable skills, Snowflake and Google cloud platform are the future

And yes MacBook Pro is a solid option. Lot of my colleagues and myself who do this sort of work are on MacBook pros. Anything with at least 16gb of ram and solid state drive should be fine though

Thank you for the info. We essentially have too much data (5TB with a growth rate of 100 GB a month) and the old way of users accessing data through convention asp.net(UI) with a SQL Server DB doesn't cut it anymore. None of this is cloud based, so I'm trying to bring cloud and a distributed file system. Issue is, someone got sales pitched by Cloudera and I could be forced to use their commercial distribution. I'm excited by it either way but will certainly will look at Snowflake/Google Cloud.

Thats even more reason to use Snowflake or Google Cloud Platform. With Azure or AWS you can’t scale your storage space in Redshift or SQL independently of compute so you need to pay for additional compute power as your data set grows

With Snowflake and GCP, you can scale storage completely independent of your compute power, and you don’t need to configure anything or bring down the system. Both snowflake and GCP automatically scale with no manual intervention. Redshift and Azure SQL requires intervention because they are just traditional on prem databases ported to the cloud.

If you only need a few days to a week of cloud resources, I'd go in that direction. If you need more than this, things can get pricey. If I'm reading Google's pricing right, at about $.20/GB per month for SSD provisioned space, your 5TB/5000BG is going to run you about $1000/month. I've seen used 1.92TB Cloudspeed Eco drives for +/-$200 or so. 1 month of SSD storage on Google pays <8TB of SSD storage that'll likely last much longer than a month. You can also pick up large (1.6TB) write intensive enterprise SSDs for not too much ($300+), and I've read DDR3 is getting back down to earth. If possible, you can way more bang for your buck from used enterprise server inventory stuff (SAS > SATA, more max memory, more cores, etc...) and use whatever laptop to RDP into your server.