List of articles
-
-
-
- 1. Use Hash
- 2. Use Bitset
- 3. Use probability algorithm
-
-
1. Use Hash
Hash as Redis
A basic data structure of ,Redis
The underlying maintenance is an open hash , Will make a difference key Values mapped to hash table On , If you encounter a keyword conflict , Then a list will be pulled out .
When a user accesses , If the user has logged in , Then we use the user’s id, If the user has not logged in , Then you can also randomly generate a key
Used to identify users , When the user accesses , We can use HSET command ,key
You can choose URI
To piece together with the corresponding date ,field
You can use the user’s id Or random identification ,value
You can simply set it to 1.
When you want to visit a website for a certain day’s visit , You can use it directly HLEN To get results ;
- advantage : Simple , Easy to implement . Easy to query , And the data accuracy is very good .
- shortcoming : Too much memory . With
key
Increase of , The performance will decrease . Can’t support massive traffic .
2. Use Bitset
For int For the number of type , If it is used to record id, Only one… Can be recorded , And if it’s converted to binary storage , Can represent 32 individual , The utilization of space has increased 32 times . For massive data processing , This way of storage will save a lot of memory space . For users who are not logged in , have access to Hash Algorithm , Hash the corresponding user ID into a number id. For 100 million data , We just need 1000000000/8/1024/1024 about 12M Space around .
and Redis
We have SETBIT Methods , It’s very convenient to use , We are item The page can be used continuously SETBIT command , The setup user has visited the page , You can also use GETBIT Method to query whether a user has access to . Finally through BITCOUNT Count the number of visits to this page every day .
-
advantage : Smaller footprint , Easy to query , You can specify to query a user , For non login users , It may be different key Map to the same id, Otherwise, you need to maintain a mapping of non login users , There’s an extra cost .
-
shortcoming : If users are too sparse , It may take up more memory than the first method
3. Use probability algorithm
For a website page if the traffic is very large , If the quantity required is not very high , Consider using probabilistic algorithms . stay Redis
in , Have been to HyperLogLog
The algorithm is encapsulated , This is a cardinality evaluation algorithm : Do not store specific values , It’s just storing some of the relevant data used to calculate probabilities .
When users visit the website , have access to PFADD command , Set the corresponding command , In the end, we just need to pass PFCOUNT Calculate the final result smoothly , Because this is a probability algorithm , So there may be some errors .
-
advantage : Minimal memory footprint , For one key, It only needs 12kb. For very large-scale data access site efficiency is very high
-
shortcoming : When querying a specific user , There may be errors . It’s not necessarily accurate in the total count .