Powered By Blogger

My Guest Book

Thursday, October 28, 2010

Preventive Maintenance and Troubleshooting


Preventive maintenance is a regular and systematic inspection, cleaning, and replacement of worn parts, materials, and systems. Preventive maintenance helps to prevent failure of parts, materials, and systems by ensuring that they are in good working order.

Troubleshooting is a systematic approach to locating the cause of a fault in a computer system. A good preventive maintenance program helps minimize failures. With fewer failures, there is less troubleshooting to do, thus saving an organization time and money. Preventive maintenance can also include upgrading certain hardware or software such as a hard drive that is making noise, upgrading memory that is insufficient, or installing software updates for security or reliability. Troubleshooting is a learned skill. Not all troubleshooting processes are the same, and technicians tend to refine their troubleshooting skills based on knowledge and personel experience. Use the guidelines in this chapter as a starting point to help develop your troubleshooting skills. Although each situation is different, the process described in this chapter will help you to determine your course of action when you are trying to solve a technical problem for a customer.

The Purpose of Preventive Maintenance

Preventive maintenance reduces the probability of hardware or software problems by systematically and periodically checking hardware and software to ensure proper operation.

Hardware

Check the condition of cables, components, and peripherals. Clean components to reduce the likelihood of overheating. Repair or replace any components that show signs of damage or excessive wear.

Use the following tasks as a guide to create a hardware maintenance program :

■      Remove dust from fan intakes.
■      Remove dust from the power supply.
■      Remove dust from components inside the computer.
■      Clean the mouse and keyboard.
■      Check and secure loose cables.

Software

Verify that installed software is current. Follow the policies of the organization when installing security updates, operating system updates, and program updates. Many organizations do not allow updates until extensive testing has been completed. This testing is done to confirm that the update will not cause problems with the operating system and software.

Use the tasks listed as a guide to create a software maintenance schedule that fits the needs of your computer equipment :

■    Review security updates.
■    Review software updates.
■    Review driver updates.
■    Update virus definition files.
■    Scan for viruses and spyware.
■    Remove unwanted programs
■    Scan hard drives for errors.
■    Defragment hard drives.

Benefits

Be proactive in computer equipment maintenance and data protection. By performing regular maintenance routines, you can reduce potential hardware and software problems. Regular maintenance routines reduce computer downtime and repair costs. A preventive maintenance plan is developed based on the needs of the equipment. A computer exposed to a dusty environment, such as a construction site, needs more attention than equipment in an office environment. High-traffic networks, such as a school network, might require additional scanning and removal of malicious software or unwanted files. Document the routine maintenance tasks that must be performed on the computer equipment and the frequency of each task. This list of tasks can then be used to create a maintenance program.

The following are the benefits of preventive maintenance:
  • ■      Increases data protection
  • ■      Extends the life of the components
  • ■      Increases equipment stability
  • ■      Reduces repair costs
  • ■      Reduces the number of equipment failures

Identify the Steps of the Troubleshooting Process

Troubleshooting requires an organized and logical approach to problems with computers and other components. A logical approach to troubleshooting allows you to eliminate variables in a systematic order. Asking the right questions, testing the right hardware, and examining the right data helps you understand the problem. This helps you form a proposed solution to try.

Troubleshooting is a skill that you will refine over time. Each time you solve another problem, you will increase your troubleshooting skills by gaining more experience. You will learn how and when to combine, as well as skip, steps to reach a solution quickly. The following troubleshooting process is a guideline that you can modify to fit your needs.

  • Explain the purpose of data protection.
  • Identify the problem.
  • Establish a theory of probable causes.
  • Test the theory to determine an exact cause.
  • Establish a plan of action to resolve the problem and implement the solution.
  • Verify full system functionality, and if applicable, implement preventive measures.
  • Document findings, actions, and outcomes.

In this section, you will learn an approach to problem solving that can be applied to both hardware and software. You also can apply many of the steps to problem solving in other work-related areas.

Note

The term customer, as used in this book, is any user who requires technical computer assistance.


Explain the Purpose of Data Protection

Before you begin troubleshooting problems, always follow the necessary precautions to protect data on a computer. Some repairs, such as replacing a hard drive or reinstalling an operating system, might put the data on the computer at risk. Make sure that you do everything possible to prevent data loss while attempting repairs.

Caution

Although data protection is not one of the six troubleshooting steps, you must protect data before beginning any work on a customer’s computer. If your work results in data loss for the customer, you or your company could be held liable.


Data Backup

A data backup is a copy of the data on a computer hard drive that is saved to media such as a CD, DVD, or tape drive. In an organization, backups are routinely done on a daily, weekly, and monthly basis.

If you are unsure that a backup has been done, do not attempt any troubleshooting activities until you check with the customer. Here is a list of items to verify with the customer about data backups:
  • Date of the last backup
  • Contents of the backup
  • Data integrity of the backup
  • Availability of all backup media for a data restore
    If the customer does not have a current backup and you are not able to create one, you
    should ask the customer to sign a liability release form. A liability release form should containat least the following information:

    • Permission to work on the computer without a current backup available
    • Release from liability if data is lost or corrupted
    • Description of the work to be performed

    Identify the Problem

    During the troubleshooting process, gather as much information from the customer as possible. The customer should provide you with the basic facts about the problem. Here is a list of some of the important information to gather from the customer:
    • Customer information
    — Company name
    — Contact name
    — Address
    — Phone number
    • Computer configuration
    — Manufacturer and model
    — Operating system information
    — Network environment
    — Connection type
    • Description of problem
    — Open-ended questions
    — Closed-ended questions 

    Conversation Etiquette

    When you are talking to the customer, you should follow these guidelines:
    ·        Ask direct questions to gather information.
    •  Do not use industry jargon when talking to customers.
    • Do not talk down to the customer.
    • Do not insult the customer.
    • Do not accuse the customer of causing the problem.
    By communicating effectively, you will be able to elicit the most relevant information about the problem from the customer.

    Open-Ended Questions

    When gathering information from customers, use both open-ended and closed-ended questions. Start with open-ended questions to obtain general information. Open-ended questions
    allow customers to explain the details of the problem in their own words. Some examples of open-ended questions are :
    • What problems are you experiencing with your computer or network?
    • What software has been installed on your computer recently?
    • What were you doing when the problem was identified?
    • What hardware changes have recently been made to your computer?
    Closed-Ended Questions

    Based on the information from the customer, you can proceed with closed-ended questions. Closed-ended questions generally require a yes or no answer. These questions are intended to get the most relevant information in the shortest time possible. Some examples of closed-ended questions are :
    • Has anyone else used your computer recently? 
    • Can you reproduce the problem?
    • Have you changed your password recently?
    • Have you received any error messages on your computer?
    • Are you currently logged in to the network?

    Documenting Responses

    Document the information obtained from the customer in the work order and in the repair journal. Write down anything that you think might be important for you or another technician. Often, the small details can lead to the solution of a difficult or complicated problem. It is now time to verify the customer’s description of the problem by gathering data from the computer.


    Event Viewer

    When system, user, or software errors occur on a computer, Event Viewer is updated with
    information about the errors. The Event Viewer application shown in Figure 4-1 records the
    following information about the problem:
    • ·         What problem occurred
    • ·         Date and time of the problem
    • ·         Severity of the problem
    • ·         Source of the problem
    • ·         Event ID number
    • ·         Which user was logged in when the problem occurred

    Figure 4-1 : Event Viewer





















    Although event viewer lists details about the error, you might need to further research the

    solution.

    Device Manager

    Device Manager, shown in Figure 4-2, displays all of the devices that are configured on a
    computer. Any device that the operating system determines to be acting incorrectly is
    lagged with an error icon. This type of error has a yellow circle with an exclamation point
    (!). If a device is disabled, it is flagged with a red circle and an ?. A yellow question mark
    (?) indicates that the hardware is not functioning properly because the system does not
    know which driver to install for the hardware.


    Figure 4-2 : Device Manager



















    Beep Codes

    Each BIOS manufacturer has a unique beep sequence for hardware failures. When troubleshooting,
    power on the computer and listen. As the system proceeds through the poweron
    self test (POST), most computers emit one beep to indicate that the system is booting
    properly. If there is an error, you might hear multiple beeps. Document the beep code
    sequence, and research the code to determine the specific hardware failure.

    BIOS Information

    If the computer boots and stops after the POST, investigate the BIOS settings to determine
    where to find the problem. A device might not be detected or configured properly. Refer to
    the motherboard manual to make sure that the BIOS settings are accurate.

    Diagnostic Tools

    Conduct research to determine which software is available to help diagnose and solve problems.
    There are many programs available that can help you troubleshoot hardware. Often,
    manufacturers of system hardware provide diagnostic tools of their own. For instance, a
    hard drive manufacturer might provide a tool that you can use to boot the computer and
    diagnose why the hard drive does not boot Windows.

    Establish a Theory of Probable Causes

    First, create a list of the most common reasons why the error would occur. Even though the
    customer may think that there is a major problem, start with the obvious issues before moving
    to more complex diagnoses. List the easiest or most obvious causes at the top and the
    more complex causes at the bottom. You will test each of these causes in the next steps of
    the troubleshooting process.

    Test the Theory to Determine an Exact Cause

    The next step in the troubleshooting process is to determine an exact cause. You determine
    an exact cause by testing your theories of probable causes one at a time, starting with the
    quickest and easiest. After identifying an exact cause of the problem, determine the steps to
    resolve the problem. As you become more experienced at troubleshooting computers, you
    will work through the steps in the process faster. For now, practice each step to better
    understand the troubleshooting process.
    If the exact cause of the problem has not been determined after you have tested all your theories,
    establish a new theory of probable causes and test it. If necessary, escalate the problem
    to a technician with more experience. Before you escalate, document each test that you
    try. Information about the tests is vital if the problem needs to be escalated to another technician.
    Many third-party tools are free to download.

    Implement the Solution

    After you have determined the exact cause of the problem, establish a plan of action to
    resolve the problem and implement the solution. Sometimes quick procedures can determine
    the exact cause of the problem or even correct the problem. If a quick procedure does
    correct the problem, you can go to step 5 to verify the solution and full system functionality.
    If a quick procedure does not correct the problem, you might need to research the problem
    further to establish the exact cause. When researching possible solutions for a problem,
    use the following sources of information:

    • ·         Your own problem-solving experience
    • ·         Other technicians
    • ·         Internet search
    • ·         Newsgroups
    • ·         Manufacturer FAQs
    • ·         Computer manuals
    • ·         Device manuals
    • ·         Online forums
    • ·         Technical websites

    Evaluate the problem and research possible solutions. Divide larger problems into smaller
    problems that can be analyzed and solved individually. Prioritize solutions starting with the
    easiest and fastest to implement. Create a list of possible solutions and implement them one
    at a time. If you implement a possible solution and it does not work, reverse the solution
    and try another.

    Verify Solution, Full System Functionality, and If Applicable, Implement Preventive Measures

    After the repairs to the computer have been completed, continue the troubleshooting
    process by verifying full system functionality and implementing any preventive measures if
    needed. Verifying full system functionality confirms that you have solved the original problem
    and ensures that you have not created another problem while repairing the computer.
    Whenever possible, have the customer verify the solution and system functionality.


    Document Findings, Actions, and Outcomes

    Finish the troubleshooting process by closing with the customer. Communicate the problem
    and the solution to the customer verbally and in writing. If possible, demonstrate how your
    solution has solved the problem. Be sure to complete the documentation, which should
    include the following information:

    • ·         Description of the problem
    • ·         Steps to resolve the problem
    • ·        Components used in the repair

    Preventive Maintenance


    Preventive Maintenance

    Preventive maintenance is a schedule of planned maintenance actions aimed at the prevention of breakdowns and failures. The primary goal of preventive maintenance is to prevent the failure of equipment before it actually occurs. It is designed to preserve and enhance equipment reliability by replacing worn components before they actually fail. Preventive maintenance activities include equipment checks, partial or complete overhauls at specified periods, oil changes, lubrication and so on. In addition, workers can record equipment deterioration so they know to replace or repair worn parts before they cause system failure. Recent technological advances in tools for inspection and diagnosis have enabled even more accurate and effective equipment maintenance. The ideal preventive maintenance program would prevent all equipment failure before it occurs.

    Value of Preventive Maintenance

    There are multiple misconceptions about preventive maintenance. One such misconception is that PM is unduly costly. This logic dictates that it would cost more for regularly scheduled downtime and maintenance than it would normally cost to operate equipment until repair is absolutely necessary. This may be true for some components; however, one should compare not only the costs but the long-term benefits and savings associated with preventive maintenance. Without preventive maintenance, for example, costs for lost production time from unscheduled equipment breakdown will be incurred. Also, preventive maintenance will result in savings due to an increase of effective system service life.
    Long-term benefits of preventive maintenance include:

    ·         Improved system reliability.
    ·         Decreased cost of replacement.
    ·         Decreased system downtime.
    ·         Better spares inventory management.
    Long-term effects and cost comparisons usually favor preventive maintenance over performing maintenance actions only when the system fails.

    When Does Preventive Maintenance Make Sense

    Preventive maintenance is a logical choice if, and only if, the following two conditions are met :
    ·         Condition #1: The component in question has an increasing failure rate. In other words, the failure rate of the component increases with time, thus implying wear-out. Preventive maintenance of a component that is assumed to have an exponential distribution (which implies a constant failure rate) does not make sense!
    ·         Condition #2: The overall cost of the preventive maintenance action must be less than the overall cost of a corrective action. (Note: In the overall cost for a corrective action, one should include ancillary tangible and/or intangible costs, such as downtime costs, loss of production costs, lawsuits over the failure of a safety-critical item, loss of goodwill, etc.)
    If both of these conditions are met, then preventive maintenance makes sense. Additionally, based on the costs ratios, an optimum time for such action can be easily computed for a single component. This is detailed in later sections.

    The Fallacy of "Constant Failure Rate" and "Preventive Replacement"

    Even though we alluded to the fact in the last section of this on-line reference, Availability, it is important to make it explicitly clear that if a component has a constant failure rate (i.e. defined by an exponential distribution), then preventive maintenance of the component will have no effect on the component's failure occurrences. To illustrate this, consider a component with an MTTF = 100 hours, or λ = 0.01, and with preventive replacement every 50 hours. The reliability vs. time graph for this case is illustrated in Figure 7.3. In Figure 7.3, the component is replaced every 50 hours, thus the component's reliability is reset to one. At first glance, it may seem that the preventive maintenance action is actually maintaining the component at a higher reliability.



    Figure 7.3: Reliability vs. time for a single component with an MTTF = 100 hours, or λ = 0.01, and with preventive replacement every 50 hours.


     However, consider the following cases for a single component:

    Case 1: The component's reliability from 0 to 60 hours:

    • ·         With preventive maintenance, the component was replaced with a new one at 50 hours so the overall reliability is the reliability based on the reliability of the new component for 10 hours, R(t = 10) = 90.48%, times the reliability of the previous component, R(t = 50) = 60.65%. The result is R(t = 60) = 54.88%.
    • ·         Without preventive maintenance, the reliability would be the reliability of the same component operating to 60 hours, or R(t = 60) = 54.88%.
    Case 2: The component's reliability from 50 to 60 hours:
    •          With preventive maintenance, the component was replaced at 50 hours so this is solely based on the reliability of the new component, for a mission of 10 hours, or R(t = 10) = 90.48%.
    •          Without preventive maintenance, the reliability would be the conditional reliability of the same component operating to 60 hours, having already survived to 50 hours, or .

    As it can be seen, both cases, with and without preventive maintenance, yield the same results.

    Determining Preventive Replacement Time

    As mentioned earlier, if the component has an increasing failure rate, then a carefully designed preventive maintenance program is beneficial to system availability. Otherwise, the costs of preventive maintenance might actually outweigh the benefits. The objective of a good preventive maintenance program is to either minimize the overall costs (or downtime, etc.) or meet a reliability objective. In order to achieve this, an appropriate interval (time) for scheduled maintenance must be determined. One way to do that is to use the optimum age replacement model, as presented next. The model adheres to the conditions discussed previously, or:
    •          The component is exhibiting behavior associated with a wear-out mode. That is, the failure rate of the component is increasing with time.
    •          The cost for planned replacements is significantly less than the cost for unplanned replacements.


    Figure 7.4: Cost curve for preventive and corrective replacement.

    Figure 7.4 shows the Cost Per Unit Time vs. Time plot. In this figure, it can be seen that the corrective replacement costs increase as the replacement interval increases. In other words, the less often you perform a PM action, the higher your corrective costs will be. Obviously, the longer we let a component operate, its failure rate increases to a point that it is more likely to fail, thus requiring more corrective actions. The opposite is true for the preventive replacement costs. The longer you wait to perform a PM, the less the costs; while if you do PM too often, the higher the costs. If we combine both costs, we can see that there is an optimum point that minimizes the costs. In other words, one must strike a balance between the risk (costs) associated with a failure while maximizing the time between PM actions.


    Preventive Maintenance and Troubleshooting

    zwani.com myspace graphic comments

    Advertise by

    Search This Blog