select * from question order by rand() limit 10;

问题:没有走索引,数据量大的时候非常慢


select * from question where id >= ((select max(id) from question) - (select min(id) from question)) * rand() + (select min(id) from question) limit 10;

问题:随机行,如果数据分布不均匀会导致其筛出的数据集中在一侧


SELECT q.* FROM question q JOIN (SELECT question_id FROM question WHERE :your_where_condition AND rand() < ( SELECT (( $limit / count(*)) * 10 ) FROM question ) ORDER BY rand() LIMIT $limit) AS z ON z.question_id = q.question_id
  1. 最终方案:在40w+数据的表中随机筛选10条数据0.3s左右,且数据分布较均匀,小ID也会被抽选到
  2. 在有主键索引的情况下,数据量300w+筛选10条数据花费的时间为2s
  3. 删除rand() < ( SELECT (( $limit / count(*)) * 10 ) FROM question )条件后时间减少到了0.7s


Ref

  1. https://blog.rxliuli.com/p/1e9c346748e14f35a390b49c5d520754/
  2. https://stackoverflow.com/questions/1823306/mysql-alternatives-to-order-by-rand/36013954#36013954