I would like to create my personal website with contact info (i.e. e-mail, maybe phone number). Preferably using OpenBSD. I would like to protect it from being harvested by crawling bots. What to do besides putting content="noindex" into html code and creating robots.txt? I think that these things would not prevent bots created by spamers that use these data illegally. I was thinking about some sort of captcha i.e. Google's reCAPTCHA. Can I integrate reCAPTCHA with site running built-in base OpenBSD system or should I use some packages/ports?

__________________
Signature: Furthermore, I consider that systemd must be destroyed.
Based on Latin oratorical phrase

========
Another key factor will be what software, (if any ) is used, IE: site builders, forum software, etc....
One of the simplest methods is using some forum software, and setting up a board that only registered users/members can view it. If you have your personal information viewable by the public, then any one can view it , like wise anyone can copy it in some way. So you really need to think twice, before you go putting your business or personal e-mail , etc on a public website.
The key being, limit what the public can see, and only allow them to see the private info if they login, and make it where they need to register, and the registration be accepted before they can view the private content.
I recently disabled a "contact form" on a phpBB forum I have, because it was getting to much spam, but that is another topic.

Quote:

creating robots.txt? I think that these things would not prevent bots created by spamers

You are right on that, the "robots.txt" does not do anything to stop spam bots.
For keeping out most bots, including spam bots, scrapers, etc. I would suggest registering here:https://zb-block.net/zbf/index.php
And learning how to use it, again it does not matter what OS you are using, the script is integrated with your website html, or php, and effectively blocks most known bad bots from even viewing your site.
In relation to spam bots, https://www.stopforumspam.com/forum/ is a good place to start.
We actually do use google Recaptcha, there as well.
We have data bases, and you can get lists of most known spam bots, or use the API, and check the credentials they use when registering, this keeps most from being able to register, thus being able to view topics, personal info, etc,...
Also SFS is a good place to ask about keeping spam bots off of your website, and again, the OS being used is not really relevant. Guess for now that is about it.
On OpenBsd there is also "pf",

I would like to use httpd. I know reCAPTCHA is not integrated into base. I would like to use as much built-in OpenBSD base software components and then add as little 3rd party software as possible. I am fine with adding a few 3rd party components from ports or GitHub, though.
If setting up robust anti-crawling system is difficult on httpd, I am going to use some other http server.

I think about static website: semantically correct HTML5 (without Javascript) + CSS. The only "dynamic" thing on site I foresee is something to keep crawling bots away.

Quote:

If you have your personal information viewable by the public, then any one can view it , like wise anyone can copy it in some way. So you really need to think twice, before you go putting your business or personal e-mail , etc on a public website.
The key being, limit what the public can see, and only allow them to see the private info if they login, and make it where they need to register, and the registration be accepted before they can view the private content.

I would like website without registration. Just captcha on some pages. I am not going to post things as sensitive as social security number (or rather something similar because I don't live in USA). On the other hand I think about posting one of aliases to my personal e-mail.

__________________
Signature: Furthermore, I consider that systemd must be destroyed.
Based on Latin oratorical phrase

I perceive there are two "built-in" options to make some or all of a website private. Both of these are included in httpd(8):

HTTP Authentication

HTTP Userid/PW authentication can be set for any location{} or server{} in your httpd.conf(5) file. If you're using HTTP, this will be in plaintext, so obviously, HTTPS is recommended.

While it can sustain brute force attacks with little resource consumption, I would not recommend using it as your only authentication method. Either use it as one factor in a 2-factor website authentication, or use PF stateful tracking to limit brute force attempts.

X.509 Client Certificates

httpd(8) supports client certificates, and if enabled as required for a server{} only browsers establishing connections with the client certificate can establish a session with the server.

But I don't want completely hide these details. I am okay if let's say some friend I didn't saw for years would send me an e-mail or stranger from human resources would copy my e-mail into Outlook and send me invitation to job interview. I just want to hide these contact details from bots. I understand there may be some spammer manually harvesting e-mail addresses, but I accept that risk. I am going to post not original, but alias to my e-mail account, so if something goes wrong I can just remove that alias from my e-mail account.
It doesn't even need to be reCAPTCH, just something I can use to differentiate between bots and people.

__________________
Signature: Furthermore, I consider that systemd must be destroyed.
Based on Latin oratorical phrase