2 points that might help an answer. 1. How evenly distributed are your ids, are they sequential? 2. How random does it need to be? Good enough random, or real random?
– MichaelMay 2 '10 at 2:42

They are sequential ids that are auto generated by activerecord and it just has to be good enough.
– jyunderwoodMay 2 '10 at 3:04

1

Then your proposed solution is close to ideal :) I'd use "SELECT MAX(id) FROM table_name" instead of the COUNT(*) as it will deal with deleted rows a bit better, otherwise, the rest is fine. In short, if "good enough" is ok, then you just have to have a method that assumes a distribution close to what you actually have. If it's uniform and even as you've said, simple rand works great.
– MichaelMay 2 '10 at 3:39

1

This won't work when you have deleted rows.
– Venkat D.May 21 '12 at 21:45

yeah, i have been working with Rails 3 and keep getting confused about the query formats between versions.
– Toby HedeMay 3 '10 at 0:39

7

Note that using offset is very slow with large dataset, since it actually needs index scan (or table scan, in case clustered index is used like InnoDB). In other words, it's O(N) operation but "WHERE id >= #{rand_id} ORDER BY id ASC LIMIT 1" is O(log N), which is much faster.
– kennMay 6 '11 at 1:02

14

Be aware that the offset-approach only yields a single randomly found data point (the first, all after are still sorted by id). If you need multiple randomly selected records you must use this approach multiple times or use the random order method provided by your database, i.e. Thing.order("RANDOM()").limit(100) for 100 randomly selected entries. (Be aware that it's RANDOM() in PostgreSQL and RAND() in MySQL ... not as portable as you may want it to be.)
– Florian PilzJun 16 '11 at 13:17

2

Doesn't work for me on Rails 4. Use Model.offset(offset).first.
– mahemoffJun 6 '14 at 2:03

Your example code will start to behave inaccurately once records are deleted (it will unfairly favor items with lower ids)

You're probably better off using the random methods within your database. These vary depending on which DB you're using, but :order => "RAND()" works for mysql and :order => "RANDOM()" works for postgres

ORDER BY RAND() for MySQL ends up in horrific runtime as data increases. It's unmaintainable (depending on time requirements) even starting at just thousands of rows.
– MichaelMay 2 '10 at 2:45

Michael brings up a great point (that is true for other DBs as well). Generally selecting random rows from large tables isn't something you want to do in a dynamic action. Caching is your friend. Rethinking what you're trying to accomplish might not be a bad idea either.
– semanticartMay 2 '10 at 2:52

1

Ordering RAND() in mysql on a table with about a million rows is slooooooooooooooooooooow.
– SubimageSep 8 '11 at 19:50

No, it does not support conditions and does not have an equal probability of selection for tables with deleted records.
– Niels B.Jun 22 '14 at 8:14

1

Come to think of it, if you apply the constraints when both counting and selecting with an offset, the technique should work. I was imagining only applying it on the count.
– Niels B.Jun 22 '14 at 9:24

In the documentation of this gem they explain "randumb simply tacks an additional ORDER BY RANDOM() (or RAND() for mysql) to your query." – therefore, the comments on bad performance mentioned in comments to the answer by @semanticart also apply when using this gem. But at least it's DB independent.
– NicolasJan 16 '14 at 13:20

It is not advised that that you use this solution, but if for some reason you really want to randomly select a record while only making one database query, you could use the sample method from the Ruby Array class, which allows you to select a random item from an array.

Model.all.sample

This method requires only database query, but it significantly slower than alternatives like Model.offset(rand(Model.count)).first which require two database queries, though the latter is still preferred.

If you have 100k rows in your database, all of these would have to be loaded into memory.
– Venkat D.Nov 13 '12 at 22:10

3

Of course it is not recommended for production realtime code, but I like this solution, it is very clear to use for special situations like the seeding the database with fake values.
– fguillenDec 9 '12 at 10:55

12

Please - never say never. This is a great solution for development-time debugging if the table is small. (And if you're taking samples, debugging is quite possibly the use case).
– mahemoffFeb 17 '13 at 0:19

Reading all of these did not give me a lot of confidence about which of these would work best in my particular situation with Rails 5 and MySQL/Maria 5.5. So I tested some of the answers on ~ 65000 records, and have two take aways:

You can use the Array method sample, the method sample returns a random object from an array, in order to use it you just need to exec in a simple ActiveRecord query that return a collection, for example:

I wouldn't recommend working with array methods while using AR. This way takes almost 8 times the time order('rand()').limit(1) does "the same" job (with ~10K records).
– Sebastian PalmaApr 23 '18 at 2:16

If you're using PostgreSQL 9.5+, you can take advantage of TABLESAMPLE to select a random record.

The two default sampling methods (SYSTEM and BERNOULLI) require that you specify the number of rows to return as a percentage of the total number of rows in the table.

-- Fetch 10% of the rows in the customers table.
SELECT * FROM customers TABLESAMPLE BERNOULLI(10);

This requires knowing the amount of records in the table to select the appropriate percentage, which may not be easy to find quickly. Fortunately, there is the tsm_system_rows module that allows you to specify the number of rows to return directly.

After seeing so many answers I decided to benchmark them all on my PostgreSQL(9.6.3) database. I use a smaller 100,000 table and got rid of the Model.order("RANDOM()").first since it was already two orders of magnitude slower.

Using a table with 2,500,000 entries with 10 columns the hands down winner was the pluck method being almost 8 times faster than the runner up(offset. I only ran this on a local server so that number might be inflated but its bigger enough that the pluck method is what I'll end up using. It's also worth noting that this might cause issues is you pluck more than 1 result at a time since each one of those will be unique aka less random.

Pluck wins running 100 time on my 25,000,000 row table
Edit: actually this time includes the pluck in the loop if I take it out it it runs about as fast as simple iteration on the id. However; it does take up a fair amount of RAM.

The bad thing about it is that it's going to load all cards from the database. It's more efficient to do it inside database.
– Anton KuzminNov 9 '13 at 10:28

You can also shuffle arrays with array.shuffle. Anyway, beware, as Card.all will load all card records into memory, which gets more inefficient the more objects we are talking about.
– Thomas KlemmNov 11 '13 at 13:20

I try this of Sam's example on my App using rails 4.2.8 of Benchmark( I put 1..Category.count for random, because if the random takes a 0 it will produce an error(ActiveRecord::RecordNotFound: Couldn't find Category with 'id'=0)) and the mine was:

.order('RANDOM()').limit(limit) looks neat but is slow for large tables because it needs to fetch and sort all rows even if limit is 1 (internally in database but not in Rails). I'm not sure about MySQL but this happens in Postgres. More explanation in here and here.

One solution for large tables is .from("products TABLESAMPLE SYSTEM(0.5)") where 0.5 means 0.5%. However, I find this solution is still slow if you have WHERE conditions that filter out a lot of rows. I guess it's because TABLESAMPLE SYSTEM(0.5) fetch all rows before WHERE conditions apply.

Another solution for large tables (but not very random) is:

products_scope.limit(sample_size).sample(limit)

where sample_size can be 100 (but not too large otherwise it's slow and consumes a lot of memory), and limit can be 1. Note that although this is fast but it's not really random, it's random within sample_size records only.

PS: Benchmark results in answers above are not reliable (at least in Postgres) because some DB queries running at 2nd time can be significantly faster than running at 1st time, thanks to DB cache. And unfortunately there is no easy way to disable cache in Postgres to make these benchmarks reliable.