Function guaranteed to never return the same value twice [closed]

This is a question I was asked at a job interview, and I can’t figure out the answer they were looking for, so I’m hoping someone here might have some ideas. The goal is to write a function that is guaranteed to never return the same value twice. Assume that this function will be accessed by multiple machines concurrently.

My idea was to assign each machine a unique id and pass that value into the unique value generator function:

var i = 0;
function uniq(process_id, machine_id) {
   return (i += 1).toString() + machine_id + "-" + process_id;
}

This would avoid the fallout from race conditions since even if two or more processes read the same value for i, each return value is tagged a unique combination of process id and machine id. However, my interviewer didn’t like this answer because bringing another machine online involves assigning it an id.

So can anybody think of another way to solve this that doesn’t involve configuring each machine to have a unique id? I’d like to have an answer in case this question comes up again. Thanks.

Don’t get fancy, just toss a simple (threadsafe) counter behind some communication endpoint (WCF, web service, whatever):

   long x = long.MinValue;
   public long ID(){
       return Interlocked.Increment(ref x);
   }

Yes, it will eventually overflow. Yes, it doesn’t handle reboots. Yes, it’s not random. Yes, someone could run this on multiple servers.

This is the simplest thing that satisfies practical requirements. Then let them be the ones that follow up with those problems (to make sure they understand the limitations, do they really think you need more than 2^64 ids), so you can then ask about what trade-offs are okay. Does it need to survive reboots? What about hard drive failure? What about nuclear war? Does it need to be random? How random?

If I were asked that question, and they made it clear that it has to be unique across reboots and across different machines, I’d give them a function that calls into the standard mechanism for creating a new GUID, whatever that happens to be in the language being used.

The interviewer said the method will be called concurrently, not in parallel; just return the date/time down to as many decimal places as you can.

Why is everyone over-thinking this? You’ll be dead a long time before any finiteness is expended and you don’t have a chance of a collision.

If you’re worried about it returning the same time, add a delay for the smallest amount of measurable time.

If you’re worried about setting a clock back for daylight savings time (experiencing 1 time twice), add a constant to the time the second time you experience it.

Firstly, you will want to ask the interviewer two questions.

Question 1.

whether the interviewer expects one or more “central machines” to be used to assign some unique numbers, or blocks of unique numbers.

Question 2.

Whether the interviewer expects a mechanism for collision detection, or instead accept the calculated risk of a minuscule chance of collision without explicitly detecting them.

There is also the defense-in-depth approach, in which one incorporates some part of user-ID into the randomness (thus, not entirely random). The chance that the same user encounters a collision within the content created by that same user is therefore lowered.

There is an implicit question 3, …

But it is one you’ll have to gauge yourself without asking, because it is extremely impolite to ask your interviewer.

Whether the interviewer assumes the knowledge of probability, risk, and some simple techniques employed in cryptographic and information-security systems.

The first kind of knowledge ensures you’re not trying to convince a non-scientific person into accepting a scientific concept they won’t accept.

The second kind of knowledge ensures that you addresses concerns which are in addition to mere probability. In other words, how to defend against “assailants” who want to intentionally break your randomization scheme, by manipulating the machine(s) or their virtual hosts to force two machines to generate the same value.

Why ask.

The reason is that if the interviewer expects it one way or another, trying to answer with the opposite approach will never make the interviewer happy.

The deeper reason is that some people do not like the idea of say, a 1.0e-20 chance of failing. (I’ll try not to stir up philosophical or religious arguments here.)

First of all the “namespace” of random numbers are made into a hierarchy, with certain number of bits allocated to one source of randomization, and the other number of bits allocated to some other ways, etc.

The centralized approach relies on some central authority to uniquely assign the first level of bits. Then, the other machines can fill the rest of the bits.

There are several decentralized approaches:

Just generate random numbers as good as one could, and accept the practically-zero chance of failing justified by calculations.
Use cryptographic means of generating random values from deterministic source, say, an incrementing values.

So, keeping in mind that this is an interview question and not an actual real-life scenario, I believe the correct approach (and probably what the interviewer is looking for) is to ask a clarifying question, or to write “It can’t be done” and move on. Here’s why.

What the Interviewer Asks:

Write a function that is guaranteed to never return the same value twice. Assume that this function will be accessed by multiple machines concurrently.

What the Interviewer Needs:

Does this candidate effectively evaluate requirements and seek additional input when required?

Never Assume.

When an engineer is handed a requirement (via a SOW or Specification or some other requirements document), some are self-evident, and others are totally unclear. This is a perfect example of the latter. As the previous answers have shown, there is no way to respond to this requirement without making several major assumptions either (a) as to the nature of the question or (b) as to the nature of the system, because the requirement cannot be met as-written (it is impossible).

Most of the answers make one attempt or another at solving the problem via a series of assumptions. One specifically recommends to just get it done quickly and let the customer worry about it if it’s wrong.

This is really a bad approach. As a customer, if I give an unclear requirement, and the engineer goes off and builds me a solution that doesn’t work, I am going to be upset that they went to work and spent my money without bothering to ask me first. That sort of cavalier decision-making demonstrates a lack of teamwork, inability to think critically, and poor judgement. It can lead to any manner of negative consequences, including loss-of-life in a safety critical system.

Why Ask the Question?

The point if this exercise is that it is expensive and time-consuming to build to ambiguous requirements. In the OP’s case, you have been given an impossible task. Your first action should be to ask for clarification – what is it that is required? What degree of uniqueness is needed? What happens if a value is non-unique? The answer to these questions could be the difference between several weeks of time and a few minutes. In the real world, one of the biggest drivers of cost in complex systems (including many software systems) is unclear and poorly-understood requirements. This leads to expensive and time-consumig bugs, re-designs, customer and team frustration, and embarassing media coverage if the project is large enough.

What Happens When You Assume?

Given my background in the aerospace industry, and due to the highly visible nature of aerospace failures, I like to bring up examples from this domain to illustrate important points. Let’s examine a pair of failed Mars missions – the Mars Climate Orbiter and Mars Polar Lander. Both missions failed due to software problems — because engineers made invalid assumptions due, in part, to unclear and poorly-communicated requirements.

Mars Climate Orbiter – this case is typically cited as what happens when NASA tries to convert English to Metric units. However, that is an overly simplistic and poor representation of what really transpired. True, there was a conversion problem, but it was due to poorly-communicated requirements in the design phase and an improper verification/validation scheme. Furthermore, when two different engineers noticed the problem because it was obvious from flight trajectory data, they didn’t raise the issue to the proper level because they assumed it was a transmission error. Would the mission ops team have been made aware of the issue, there was adequate time to correct it and save the mission. In this case, there was an impossible logical condition that was not recognized for what it was, leading to costly mission failure.

Mars Polar Lander – this case is a little less well-known, but possibly more embarassing due to its temporal proximity to the Mars Climate Orbiter failure. In this mission, the software controlled the thruster-assisted descent of the rocket into the Martian surface. At a point 40 meters above the surface, the legs of the lander deployed in preparation for landing. There was also a sensor on the legs that detected motion (to signal when they had impacted) to tell the software to shut off the engine. NASA’s best guess as to what happened (because there are multiple possibile failures and incomplete data) is that random vibrations in the legs due to their deployment simultaneously and improperly triggered the shutdown mechanism 40m above the surface, resulting in the crash and destruction of the $110M spacecraft. This possibility was raised in development, but was never addressed. Ultimately, the software team made invalid assumptions about how this code needed to run (one such assumption is that a spurious signal would be too short-lived to be picked up, despite tests showing the contrary), and those assumptions were never questioned until after the fact.

Additional Considerations

Interviewing and evaluating people is a tricky business. There are several dimensions of a candidate that an interviewer may wish to explore, but one of the most important is an idividual’s ability to think critically. For a variety of reasons, not the least of which is that critical thinking is poorly-defined, we have a very difficult time evaluating critical thinking skills.

As an engineering instructor, one of my favorite ways to evaluate a student’s ability to think critically was to ask a somewhat ambiguous question. The sharper students would pick up on the question’s faulty premise, note it, and either answer given the premise or decline to answer altogether. Typically, I would ask a question similar to the following:

You pick up a drawing from your stack of work. The drawing contains a variety of different callouts, but the most important points to a horizontal surface and says “Perfectly flat”. The surface is 5″ wide by 16″ long, and the part is made of aluminum. How will you machine the part to create this feature?

(By the way, you would be shocked at how often such a poor specification appears in the workplace.)

I expect that students will recognize that it is not possible to create a perfect feature, and that they will state this in their answer. I typically would award a bonus point if they say they will go back to the designer and ask for clarification before making the part. If a student proceeds to tell me how they are going to achieve .001 planarity or some other made up value, I award zero points. This helps me make a point to my students that they need to be thinking of the bigger picture.

Bottom Line

If I am interviewing an engineer (or similar profession), I am looking for someone who can think critically and question what has been placed in front of him. I want someone who asks the question “Does this make sense?” .

It does not make sense to ask for a perfectly flat part, because there is no such thing as perfect. It does not make sense to ask for a function that never returns a duplicate value, because it is impossible to make such a guarantee. In programming, we often hear the phrase “garbage in, garbage out.” If you are handed garbage for requirements, it is your ethical responsibility to stop and ask whatever question helps you elicit the true intent. If I’m interviewing a candidate, and I give them an unclear requirement, I am going to expect clarification questions.

Guaranteeing uniqueness is difficult because computers do not have infinitely-large variables. No real-world Turing machine can.

The way I see it there are two problems here, and both have well-established solutions.

Concurrency. Multiple machines may need a value at the same time. Thankfully, modern CPUs have concurrency built-in and some languages provide developer-friendly facilities to take advantage of this.
Uniqueness. While impossible to guarantee uniqueness, we can have arbitrarily-large variables which can hold values so large that a real-world system would have a very difficult time exhausting all of the unique values

Here is my solution in Java:

public class Foo {
  private static BigInteger value = BigInteger.ZERO;
  private static final Lock lock = new ReentrantLock();

  public static BigInteger nextValue() {
    try {
      lock.lock();
      value = value.add(BigInteger.ONE);
      return value;
    }
    finally {
      lock.unlock();
    }
  }
}

BigInteger is an arbitrary-size integer type. It can grow to hold values that are quite large, even if not infinite. The lock ensures concurrency, so the same value cannot be returned twice by two simultaneous requests serviced by separate threads.

I would expose the function via a port on the server; to call the function, the requesting machine requests a connection and is granted one, while at the same time being allocated an identifying code (sequential number for simplicity). Whenever a message is sent to the port requesting the unique value, the value is generated by concatenating the MD5 hash of the current date and time with the MD5 hash of the identifying code.

If they want a more bulletproof solution they would have to specify their actual requirements rather than being all vague about things.

string uniq(string machine_id) 
{
   static long u = long.MinValue;
   Interlocked.Increment(ref u);

   //Time stamp with millisecond precison
   string timestamp = DateTime.UtcNow.ToString("yyyy-MM-dd HH:mm:ss.fff",
                                            CultureInfo.InvariantCulture);

   return machine_id + "-" + timestamp + "-" + u;
}

In the above way we can make sure that return value is different even if there are restart or even if called simultaneous from different machines.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:13

Thẻ: algorithms, concurrency

Thiết kế website giá rẻ

Danh mục